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P. .NT COOPERATION TREA 



From the INTERNATIONAL BUREAU 



PCT 

NOTIFICATION OF ELECTION 

(PCT Rule 61.2) 


To: 

Assistant Commissioner for Patents 
1 Initprl ^tafp<; Patpnt and Trademark 

Office 
BoxTCf ■"" 

Washington, D.C.2Q23 1 
ETATS-UNIS D'AMERIQUE 

in its capacity as elected Office 


Date of mailing (day/month/year) 

08 November 1999 (08.11.99) 




International application No. 
PCT/JP99/01481 


Applicant's or agent's file reference 
JA908155 


International filing date (day/month/year) 
24 March 1999 (24.03.99) 


Priority date (day/month/year) 
24 March 1998 (24.03.98) 


Applicant 

MORI, Satoshi et al 



1. The designated Office is hereby notified of its election made: 

[ X | in the demand filed with the International Preliminary Examining Authority on: 
20 October 1999 (20.10.99) 

| | in a notice effecting later election filed with the International Bureau on: 



2. The election | X| was 

| | was not 

made before the expiration of 1 9 months from the priority date or, where Rule 32 applies, within the time limit under 
Rule 32.2(b). 





Authorized officer 


The International Bureau of WIPO 


34, chemin des Colombettes 


Maria Kirchner 


1211 Geneva 20, Switzerland 




Facsimile No.: (41-22) 740.14.35 


Telephone No.: (41-22)338.83.38 


Form PCT/IB/331 (July 1992) 
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From the INTERNATIONAL BUREAU 


PCT 

NOTICE INFORMING THE APPLICANT OF THE 
COMMUNICATION OF THE INTERNATIONAL 
APPLICATION TO THE DESIGNATED OFFICES 

(PCT Rule 47.1(c), first sentence) 


To: 

SAEKI, Norio 

Taka-ai Building, 9th floor 

15-2, Nihonbashi 3-chome 

Chuo-ku ><^ tta ^ ^^*~* = ********^ 
Tokyo 103-0027 / Hpf^G^ /-Cf^K 

japon •f-^^ S dsaM£Q\ 


Date of mailing (day/month/year) 

30 September 1999 (30.09.99) 




Applicant's or agent's file reference 
JA908155 


IMPORTANT NOTICE 


International application No. 
PCT/JP99/01481 


International filing date (day/month/year) 
24 March 1999 (24.03.99) 


Priority date (day/month/year) 
24 March 1998 (24.03.98) 


Applicant 

JAPAN SCIENCE AND TECHNOLOGY CORPORATION et al 



1. Notice is hereby given that the international Bureau has communicated, as provided in Article 20, the international application 
to the following designated Offices on the date indicated above as the date of mailing of this Notice: 

AU,EP,US 

In accordance with Rule 47.1(c), third sentence, those Offices will accept the present NotiGe as conclusive evidence that 
the communication of the international application has duly taken place on the date of mailing indicated above and no copy 
of the international application is required to be furnished by the applicant to the designated Office(s). 

2. The following designated Offices have waived the requirement for such a communication at this time: 

CA 



The communication will be made to those Offices only upon their request. Furthermore, those Offices do not require the 
applicant to furnish a copy of the international application (Rule 49.1 (a-bis)}. 

3. Enclosed with this Notice is a copy of the international application as published by the International Bureau on 
30 September 1999 (30.09.99) under No. WO 99/48356 

REMINDER REGARDING CHAPTER II (Article 31(2)(a) and Rule 54.2) 

If the applicant wishes to postpone entry into the national phase until 30 months (or later in some Offices) from the priority 
date, a demand for international preliminary examination must be filed with the competent International Preliminary 
Examining Authority before the expiration of 19 months from the priority date. 

It is the applicant's sole responsibility to monitor the 1 9-month time limit. 

Note that only an applicant who is a national or resident of a PCT Contracting State which is bound by Chapter II has the 
right to file a demand for international preliminary examination. 



REMINDER REGARDING ENTRY INTO THE NATIONAL PHASE (Article 22 or 39(1)) 

If the applicant wishes to proceed with the international application in the national phase, he must, within 20 months 
or 30 months, or later in some Offices, perform the acts referred to therein before each designated or elected Office. 

For further important information on the time limits and acts to be performed for entering the national phase, see the 
Annex to Form PCT/IB/301 (Notification of Receipt of Record Copy) and Volume II of the PCT Applicant's Guide. 



The International Bureau of WIPO 


Authorized officer 






34, chemin des Col mbettes 


J. Zahra 




1211 Geneva 20, Switzerland 




Facsimile No. (41-22) 740.14.35 


Telephone No. (41-22) 338.83.38 




Form PCT/IB7308 (July 1996) 




2859249 
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NT COOPERATION TREA' 

From the INTERNATIONAL BUREAU 



PCT 



INFORMATION CONCERNING ELECTED 
OFFICES NOTIFIED OF THEIR ELECTION 

(PCT Rule 61.3) 



Date of mailing (day/month/year) 

08 November 1999 (08.1 1 .99) 



To: 



SAEKI, Norio 

Taka-ai Building, 9th floor 
15-2, Nihonbashi 3-chome 
Chuo-ku 
Tokyo 103-0027 
JAPON 



Applicant's or agent's file reference 
JA908155 


IMPORTANT INFORMATION 


International application No. 
PCT/JP99/01481 


International filing date (day/month/year) 

24 March 1999 (24.03.99) 


Priority date (day/month/year) 

24 March 1998 (24.03.98) 


Applicant 

JAPAN SCIENCE AND TECHNOLOGY CORPORATION et al 



1. The applicant is hereby informed that the International Bureau has, according to Article 31(7), notified each of the following 
Offices of its election: 

National :AU,CA,US 

2. The following Offices have waived the requirement for the notification of their election; the notification will be sent to them 
by the International Bureau only upon their request: 

None 

3. The applicant is reminded that he must enter the " national phase" before the expiration of 30 months from the priority date 
before each of the Offices listed above. This must be done by paying the national fee(s) and furnishing , if prescribed, a 
translation of the international application (Article 39(1 )(a)), as well as, where applicable, by furnishing a translation of any 
annexes of the international preliminary examination report (Article 36(3)(b) and Rule 74.1). 

Some offices have fixed time limits expiring later than the above-mentioned time limit. For detailed information about the 
applicable time limits and the acts to be performed upon entry into the national phase before a particular Office, see Volume 11 
of the PCT Applicant's Guide. 

The entry into the European regional phase is postponed until 31 months from the priority date for ail States designated for 
the purposes of obtaining a European patent. 





The International Bureau of WIPO 
34, chemin des Colombettes 
121 1 Geneva 20. Switzerland 

Facsimile No. (41-22) 740.14.35 


Authorized officer: 

Maria Kirchnerv f\ y \ 

Telephone No. (41 -22) 338.83.38 \ 



Form PCT/IB/332 (September 1997) 



2940170 
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NT COOPERATION TREA" 

From the INTERNATIONAL BUREAU 



NOTIFICATION OF RECEIPT OF 
RECORD COPY 

(PCT Rule 24.2(a)) 



To: 



SAEKI, Norio 

Taka-ai Building, 9th floor 
15-2, Nihonbashi 3-chome 
Chuo-ku 
Tokyo 103-0027 
JAPON 



Date of mailing (day/month/year) 
23 April 1999 (23.04.99) 


IMPORTANT NOTIFICATION 


Applicant's or agent* s file reference 
JA908155 


International application No. 
PCT/JP99/01481 




The applicant is hereby notified that the International Bureau has received the record copy of the international application as 
detailed below. 

Name(s) of the applicant(s) and State(s) for which they are applicants: 

JAPAN SCIENCE AND TECHNOLOGY CORPORATION (for all designated States except US) 

MORI, Satoshi et al (for US) 

International filing date 24 March 1999 (24.03.99) 

Priority date(s) claimed 24 March 1998 (24.03.98) 

Date of receipt of the record copy qq . 
by the International Bureau 09 April 1999 (03.04.33) 

List of designated Offices • 



EP rAT^E^CH^Y^DE.DK^S^FUFR^GB^RJEJT^U^MCNUPT^E 
National :AU,CA,US 



ATTENTION 

The applicant should carefully check the data appearing in this Notification. In case of any discrepancy between these data 
and the indications in the international application, the applicant should immediately inform the International Bureau. 
In addition, the applicant's attention is drawn to the information contained in the Annex, relating to: 

| X | time limits for entry into the national phase 

| X| confirmation of precautionary designations 

| X | requirements regarding priority documents 

A copy of this Notification is being sent to the receiving Office and to the International Searching Authority. 



The International Bureau of WIPO 


Authorized officer: Cj^^ 




34, chemin des Colombettes 


M. Sakai 


1211 Geneva 20. Switzerland 




Facsimile No- (41-22)740.14.35 


Telephone No. (41-22)338.83.38 J 



Form PCT/IB/301 (July 1998) 



m. 






ANNEX TO 



M PCT/IB/301 



INFORMATION ON TIME LIMITS FOR ENTERING THE NATIONAL PHASE 



The applicant is reminded that the "national phase" must be entered before each of the designated Offices indicated in the 
Notification of Receipt of Record Copy (Form PCT/IB/301) by paying national fees and furnishing translations, as prescribed by 
the applicable national laws. 

The time limit for performing these procedural acts is 20 MONTHS from the priority date or, for those designated States 
which the applicant elects in a demand for international preliminary examination or in a later election, 30 MONTHS from the 
priority date, provided that the election is made before the expiration of 19 months from the priority date. Some designated (or 
elected) Offices have fixed time limits which expire even later than 20 or 30 months from the priority date. In other Offices an 
extension of time or grace period, in some cases upon payment of an additional fee, is available. 

In addition to these procedural acts, the applicant may also have to comply with other special requirements applicable in 
certain Offices, ft is the applicant's responsibility to ensure that the necessary steps to enter the national phase are taken in a 
timely fashion. Most designated Offices do not issue reminders to applicants in connection with the entry into the national 



For detailed information about the procedural acts to be performed to enter the national phase before each designated 
Office, the applicable time limits and possible extensions of time or grace periods, and any other requirements, see the relevant 
Chapters of Volume II of the PCX Applicant's Guide- Information about the requirements for filing a demand for international 
preliminary examination is set out in Chapter IX of Volume I of the PCT Applicant's Guide. 

GRand ES became bound by PCT Chapter II on 7 September 1996 and 6 September 1997, respectively, and may, therefore, 
be elected in a demand or a later election filed on or after 7 September 1996 and 6 September 1997, respectively, regardless of 
the filing date of the international application. (See second paragraph above.) 

Note that only an applicant who is a national or resident of a PCT Contracting State which is bound by Chapter II has 
the right to file a demand for international preliminary examination. 



This notification lists only specific designations made under Rule 4.9(a) in the request It is important to check that these 
designations are correct. Errors in designations can be corrected where precautionary designations have been made under 
Rule 4.9(b). The applicant is hereby reminded that any precautionary designations may be confirmed according to Rule 4.9(c) 
before the expiration of 15 months from the priority date. If it is not confirmed, it will automatically be regarded as withdrawn 
by the applicant There will be no reminder and no invitation. Confirmation of a designation consists of the filing of a notice 
specifying the designated State concerned (with an indication of the kind of protection or treatment desired) and the payment 
of the designation and confirmation fees. Confirmation must reach the receiving Office within the 15-month time limit 



For applicants who have not yet complied with the requirements regarding priority documents, the following is recalled. 

Where the priority of an earlier national, regional or international application is claimed, the applicant must submit a copy 
of the said earlier application, certified by the authority with which it was filed ("the priority document") to the receiving Office 
(which will transmit it to the International Bureau) or directly to the International Bureau, before the expiration of 1 6 months fr n 
the priority date, provided that any such priority document may still be submitted to the International Bureau before that date of 
international publication of the international application, in which case that document will be considered to have been received 
by the International Bureau on the last day of the 16-month time limit (Rule 17.1(a)). 

Where the priority document is issued by the receiving Office, the applicant may, instead of submitting the priority 
document request the receiving Office to prepare and transmit the priority document to the International Bureau. Such request 
must be made before the expiration of the 16-month time limit and may be subjected by the receiving Office to the payment 
of a fee (Rule 17.1(b)). 

If the priority document concerned is not submitted to the International Bureau or if the request to the receiving Office 
to prepare and transmit the priority document has not been made (and the corresponding fee, if any, paid) within the applicable 
time limit indicated under the preceding paragraphs, any designated State may disregard the priority claim, provided that no 
designated Office may disregard the priority claim concerned before giving the applicant an opportunity to furnish the priority 
document within a time limit which is reasonable under the circumstances. 

Where several priorities are claimed, the priority date to be considered for the purposes of computing the 16-month time 
limit is the filing date of the earliest application whose priority is claimed. 



phase. 



CONFIRMATION OF PRECAUTIONARY DESIGNATIONS 



REQUIREMENTS REGARDING PRIORITY DOCUMENTS 



Form PCT/IB/301 (Annex) (July 1998) 



002584658 
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PCT/JP99/01481 



From the INTERNATIONAL BUREAU 



PCT 

NOTIFICATION CONCERNING 
SUBMISSION OR TRANSMITTAL 
OF PRIORITY DOCUMENT 

(PCT Administrative Instructions, Section 411) 


To: 

SAEK1, Norio 

Taka-ai Building, 9th floor 

15-2, Nihonbashi 3-chome 

( 1 UU KU 

Tokyo 103-0027 
JAPON 


Date of mailing (day/month/year) 
26 May 1999 (26.05.99) 




Applicant's or agent's file reference 
JA9081 55 


IMPORTANT NOTIFICATION 


International application No. 
PCT/JP99/01481 


International filing date (day/month/year) 
24 March 1999 (24.03.99) 


International publication date (day/month/year) 

Not yet published 


Priority date (day/month/year) 
24 March 1998 (24.03.98) 


Applicant 

JAPAN SCIENCE AND TECHNOLOGY CORPORATION et al 



1. The applicant is hereby notified of the date of receipt (except where the letters "NR" appear in the right-hand column) by the 
International Bureau of the priority document(s) relating to the earlier applicatton(s) indicated below. Unless otherwise 
indicated by an asterisk appearing next to a date of receipt or by the letters "NR", in the right-hand column, the priority 
document concerned was submitted or transmitted to the International Bureau in compliance with Rule 17.1(a) or (b). 

2. This updates and replaces any previously issued notification concerning submission or transmittal of priority documents. 

3. An asterisk^*) appearing next to a date of receipt in the right-hand column, denotes a priority document submitted 
or transmitted to the International Bureau but not in compliance with Rule 17.1 (a) or (b). In such a case, the attention 
of the applicant is directed to Rule 17.1(c) which provides that no designated Office may disregard the priority claim 
concerned before giving the applicant an opportunity, upon entry into the national phase, to furnish the priority document 
within a time limit which is reasonable under the circumstances. 

4. The letters "NR" appearing in the right-hand column denote a priority document which was not received by the International 
Bureau or which the applicant did not request the receiving Office to prepare and transmit to the International Bureau, 

as provided by Rule 17.1(a) or (b), respectively. In such a case, the attention of the applicant is directed to Rule 17.1(c) which 
provides that no designated Office may disregard the priority claim concerned before giving the applicant an opportunity, 
upon entry into the national phase, to furnish the priority document within a time limit which is reasonable under the 
circumstances. 



Priority date Priority application No. Country or regional Office Date of receipt 

or PCT receiving Office of priority dpcumem; 

24 Marc 1998 (24.03.98) 10/96637 JP 21 May 1999 (21.05.99) 




The International Bureau of W1PO 


Authorized officer 






34, chemin des Colombettes 


Juan Cruz 




1211 Geneva 20. Switzerland 






Facsimile No. (41-22) 740.14.35 


Telephone No. (41-22) 338.83.38 




Form PCT/IB/304 (July 1998) 
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The olfactory receptor (OR) gene cluster on human 
chromosome 17pl3.3 was subjected to mixed shotgun 
automated DNA sequencing. The resulting 412 kb of 
genomic sequence include 17 OR coding regions, 6 of 
which are pseudogenes. Six of the coding regions were 
discovered only upon genomic sequencing, while the 
others were previously reported as partial sequences. 
A comparison of DNA sequences in the vicinity of the 
OR coding regions revealed a common gene structure 
with an intronless coding region and at least one up- 
stream noncoding exon. Potential gene control re- 
gions including specific pyrimidine:purine tracts and 
Olf-1 sites have been identified. One of the pseudo- 
genes apparently has evolved into a CpG island. Four 
extensive CpG islands can be discerned within the 
cluster, not coupled to specific OR genes. The cluster is 
flanked at its telomeric end by an unidentified open 
reading frame (C17orf2) with no significant similarity 
to any known protein. A high proportion of the cluster 
sequence (about 60%) belongs to various families of 
interspersed repetitive elements, with a clear predom- 
inance of LINE repeats. The OR genes in the cluster 
belong to two families and seven subfamilies, which 
show a relatively high degree of intermixing along the 
cluster, in seemingly random orientations. This 
genomic organization may be best accounted for by a 
complex series of evolutionary events. © 2000 Academic 

Press 



INTRODUCTION 

Environmental stimuli are recognized by sensory 
neurons, and this information is transmitted to the 

Sequence data from this article have been deposited with the 
EMBL/GenBank Data Libraries under Accession Nos. AC007194. 
AF087915-AF087930. and AF155225. 

1 To whom correspondence should be addressed. Telephone: 
972-8-9343683. Fax: 972-8-9344112. E-mail: bmlancet@weizmann. 
weizmann.ac.il. 



brain, where it is decoded to provide an internal rep- 
resentation of the external world. The vertebrate olfac- 
tory system is exquisitely adapted for recognition and 
discrimination among a large number of odorants, with 
high sensitivity and specificity (Laurent, 1997; Pilpel et 
al., 1998). The initial step in olfactory discrimination 
involves the interaction of odorant molecules with a 
large repertoire of specific receptors. 

Olfactory receptor (OR) genes encode G-protein-cou- 
pled seven-transmembrane proteins (Buck and Axel, 
1991). Unlike the somatic gene recombination and mu- 
tation mechanisms that account for immunoglobulin 
diversity, the OR repertoire diversity seems to be 
germline-inherited. The OR gene superfamily is the 
largest in the mammalian genome. It is estimated to 
consist of several hundred genes in mammalian species 
and about 100 genes in catfish (reviewed in Mom- 
baerts, 1999), suggesting a large expansion of the OR 
repertoire in higher vertebrates. A given OR was 
shown to be expressed by about 0.1% of the sensory 
neurons within the rodent olfactory epithelium (Vassar 
etaL, 1993; Ressler etaL, 1993). Estimating 500-1000 
OR genes in the rat genome (Buck and Axel, 1991), 
these findings are consistent with the phenomenon of 
clonal and allelic exclusion in ORs (Lancet, 1991; 
Chess et al, 1994; Malnic et al, 1999), in which a 
neuron expressing a given receptor does not activate 
expression of other ORs. The multiplicity of receptors 
reflects the needs of a combinatorial coding system, in 
which each receptor may bind many odorants and each 
odorant binds several receptors (Lancet, 1986; Malnic 
etaL, 1999), as analyzed by a probabilistic model (Lan- 
cet etaL, 1993). 

The OR repertoire contains a large percentage of 
pseudogenes that may be important for the generation 
and maintenance of diversity. The especially large 
number of OR pseudogenes in the human genome (up 
to -70%) (Rouquier et aL, 1998) may reflect a loss 
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of functional genes in the "microsmatic" primates 
(Sharon et aL, 1999). 

Many of the human OR genes appear in genomic 
clusters with 10 or more members (Ben-Arie et aL, 
1994; Glusman et aL, 1996; Vanderhaeghen et aL, 
1997; Carver et aL, 1998; Trask et aL, 1998). An esti- 
mated total number of 500 human OR genes would 
indicate 30-50 such clusters, about half of which have 
been identified by cloning or by fluorescence in situ 
hybridization (FISH) on almost all human chromo- 
somes (Rouquier et aL, 1998). In mouse, in which the 
estimated OR number may reach more than 1000, 12 
clusters have so far been identified by genetic linkage 
on seven different chromosomes (Sullivan et aL, 1996). 
OR genomic clustering also was indicated by Southern 
hybridization analysis in dog (Issel Tarver and Rine, 
1996) and by genomic mapping in zebrafish (Barth et 
aL, 1997). The complete collection of OR-containing 
genomic regions has been termed the "olfactory subge- 
nome" (Ben-Arie et aL, 1993; Glusman et aL, 1996), 
estimated to encompass -1% of the entire genome of 
mammalian species. 

The availability of genomic sequences surrounding 
OR genes provides a unique opportunity to study the 
evolution of this multigene superfamily and to trace 
the mechanisms or genome dynamics that may have 
been responsible for its current size and variety. Using 
sequence comparison, ORs are classified into families 
(>40% amino acid identity) and subfamilies (>60% 
amino acid identity) (Ben-Arie etaL, 1994). An analysis 
of clusters in human (Ben-Arie etaL, 1994; Trask etaL, 
1998), mouse (Sullivan et aL, 1996), and zebrafish 
(Barth et aL, 1997) indicates that each cluster may 
contain members of several subfamilies or even fami- 
lies. This suggests that present-day OR clusters have 
evolved in a complex path, involving ancient precursor 
gene duplications, as well as more recent within-clus- 
ter gene duplications. Conversely, genes of a given 
subfamily may be found in more than one cluster (Sul- 
livan et aL, 1996; Rouquier et aL, 1998), suggesting 
that clusters may be duplicated, in part or in their 
entirety. In the latter case, this may occur via a dupli- 
cation process that generates paralogous regions on 
different chromosomes. Repetitive genomic DNA ele- 
ments (e.g., Alu and LINE) were suggested to have a 
crucial role in mediating recombination events that 
lead to OR gene duplications (Glusman et aL, 1996). 
Finally, the olfactory subgenome has been hypothe- 
sized to be "exclusive" in the sense that no non-OR 
genes have been found interspersed with OR genes 
(Glusman etaL, 1996). 

The OR coding regions are uninterrupted by introns 
in the genome (Ben-Arie et aL, 1994) like many G- 
protein-coupled receptors (Gentles and Karlin, 1999), 
though the possibility of at least one exception has 
been reported (Walensky et aL, 1998). Characteriza- 
tion of human and murine OR genes revealed an intron 
separating a noncoding leader exon and from the cod- 
ing exon (Glusman et aL, 1996; Asai et aL, 1996) and 



showed that transcription is initiated from a region 
upstream from the leader exon (Asai et aL, 1996; 
Walensky et aL, 1998; Qasba and Reed, 1998). The 
mechanism of control that generates the complex pat- 
tern of odorant receptor expression still remains 
largely unknown. 

Partial genomic sequences of OR gene clusters have 
been published (Glusman et aL, 1996; Brand-Arpon et 
aL, 1999), giving initial genomic insights into the or- 
ganization of OR genes, their structure and evolution, 
as well as some hints on potential mechanisms for 
transcriptional control. We present here the sequenc- 
ing and analysis of the first complete OR gene cluster. 
The full analysis and annotation of the sequence, as 
well as ancillary information, can be viewed at http:// 
bioinfo. weizmann.ac.il/papers/C 1 7olf_cluster. 

MATERIALS AND METHODS 

Reagents and equipment. Cosmids were from library ICRFclOS 
isolated from human cell line LCL127 (Nizetic etaL 1991) from the 
Resource Center, Primary Database of the German Human Genome 
Project (Nizetic et aL, unpublished results). Ten clones of the 80 
cosmids covering the cluster were chosen for sequencing as follows: 
F03103 (cos 17). D10132 (cos26). H07155 (cos32), B01193 (cos39), 
F06137 (cos46), E06173 (cosL53), E06184 (cos58). Fl 155 (cos65), 
H0468 (cos68), and D093 (cos73). In addition, two PAC clones 
C10910Q3 (P8) and E02527Q3 (P123) from the whole-genome library 
LLNLP704 (Ioannou et aL, 1994) were sequenced. Additional cos- 
mids mapped (written in pairs of ICRFcl05 number- our number) 
were H1241-4; D0345-6; E0364-7; C0435-9; A081 10-19; G06112- 
20; F041 13-21; Gil 124-22; A02138-27; B09144-29; H09113-42; 
F08120-44; D01 121-45; A06163-51; C10182-56; F09183-57; 
B0595-61; D0759-62; B1015-63; F082-66; E101-69; C127-70; 
CI 17-71; D0569-74; A107-75, and H0689-76. Additional PACs 
mapped (written in pairs of LLNLP704 number-our number) were 
N01235Q19-1; E13239Q19-2; M121198Q4-3; B211178Q4-4; 
M15660Q3-5; I17730Q3-6; J10811Q3-7; C10910Q3-8; 
E10912Q3-9; F05891Q3-10; M15947Q3-11; N04302Q19-101; 
N21613Q3-102; M21613Q3-103; K22613Q3-104; E24597Q3-106; 
P18817Q3-108; B02928Q3-109; P04 1058Q3-1 10; P04 1064Q3-1 1 1; 
L091077Q3-112; A09104 1Q3-1 13; P021089Q4 -1 19; E02527Q3- 
123; P16680Q3-128; M22845Q3-129; P241019Q3-131; and 
P231019Q3-132. 

Mapping of PACs. The PAC clones used in the current work were 
obtained from RZPD using DNA probes prepared form the cosmids at 
the ends of the three cosmid contigs described (Ben-Arie etaL, 1994), 
i.e., cosmids 26, 53, 58, and 68. All the PAC clones thus obtained 
were subjected to PCR analysis with primers specific for several OR 
coding regions (ORs 93. 201, 2, 7, 30, 23. 24, 208, 209, 210, 4, and 31), 
as well as with some cosmid ends (Fig. 1). 

Generation of sequencing templates. Except for cosmid 65, all the 
clones were sequenced using the shotgun strategy (Bodenteich et aL, 
1993; Rowen and Koop. 1994). Cosmid or PAC DNA was sheared 
either by sonication or by nebulization, and the ends were repaired 
by treatment with T4 DNA polymerase followed by Klenow treat- 
ment or alternatively by treatment with mung bean nuclease fol- 
lowed by T4 DNA polymerase. The repaired DNA was size-fraction- 
ated on a 0.8% agarose gel. and fragments of 0.8-1.5 kb were excised 
and purified with a Qiagen gel extraction kit (Qiagen Gmbh, Ger- 
many) for ligation with M13 RF phage DNA (Novagen). Alterna- 
tively, fragments of 2-6 kb were excised and purified from low- 
melting-point 0.8% agarose gels with gelase (Epicentre) or a Qiagen 
kit as above, for ligation to pBluescript (Stratagene) or pUC 18 (Phar- 
macia) vectors. Ligation was performed with a Rapid Ligase kit 
(Boehringer) or with a Fast Link kit (Epicentre) according to the 
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manufacturer s instructions. The ligated DNA was used for transfor- 
mation of XL1 Blue competent cells (Stratagene). DNA from single 
clones was subjected to direct sequencing, or PCR products of these 
clones were subjected to sequencing reactions. When direct sequenc- 
ing was applied, DNA was prepared by Qiagen kits; either an Ml 3 
extraction kit for single-stranded DNA or the turbo miniprep kit for 
double-stranded DNA was used according to the manufacturer's 
instructions (Qiagen Gmbh, Germany). These kits were used in their 
96-well format with the 96-manifoId apparatus, which was connected 
to a Biomek 2000 robot (Beckman). Some clones were also prepared 
by a cleared lysate filter-based protocol (Chissoe et al, 1995) and 
sequenced as described (Bodenteich et al, 1993). When PCR prod- 
ucts were to be sequenced, they were cleaned by a 96-well Gel 
Filtration Block (Edge BioSystems) prior to fluorescence labeling. 

Sequencing reactions. DNA was labeled either by fluorescence- 
labeled primers or by fluorescent dye terminators — Prism cycle se- 
quencing and Big-Dyes kits (Perkin-Elmer/Applied Biosystems) — 
and analyzed on ABI 373 or ABI 377 sequencers. 

Finishing and gap closure. Finishing of cosmid 65 as well as 
finishing of cosmids 17, 68, and PAC 8 was performed using the 
differential extension with nucleotide subsets (DENS) method (Raja 
et al., 1997). Briefly, in this method single-stranded DNA is synthe- 
sized by PCR and is then subjected to DNA sequencing by primer 
walking using a presynthesized primer library. Cosmid 65 was used 
as template to sequence the 10-kb gap between cosmids 46 and 58. 
Primers for synthesis of the desired segment on cosmid 65 were 
designed using the programs Oligo (Rychlik, 1995) and Amplify 
(Engels, 1993) based on known sequence from the overlapping cos- 
mids 46 and 58. Sequence finishing of the other clones was performed 
by standard primer walking along the regions where sequence was in 
doubt. 

Sequence assembly. Assembly was performed using Sequencher 
3.1 software from GeneCodes Corp. and/or phrap (University of 
Washington). Based on experience with the sequencing methodology 
in other projects, we estimate the precision of the consensus se- 
quence to be over 99.9%. Additional quality control was obtained by 
comparison of overlapping, independently sequenced clones. The 
cluster sequence has been deposited with GenBank under Accession 
No. AC007194. The full coding sequences for the 17 OR genes in the 
cluster have been deposited with GenBank under Accession Nos. 
AF087915-AF087930 and AF 155225. 

Sequence analysis. Sequences were analyzed using the 
GESTALT Workbench (Glusman and Lancet, in preparation). 
Briefly, GESTALT is a Perl-based workbench for automated large- 
scale genomic sequence analysis, comparison, and annotation. GE- 
STALT integrates and depicts graphically the output of diverse 
sequence analysis algorithms, including database searches, gene 
modeling tools, recognition of interspersed repeats, statistical ORF 
analysis, and compositional analyses, as well as user annotation. 

Open reading frame analysis. The significance of each observed 
open reading frame (ORF) of length L was estimated by calculating 
an expectation value E(L) as the probability of finding an ORF of 
length L or longer, times the number of possible such ORFs (approx- 
imately the length of the sequence). The probability for length >L 
was calculated assuming an exponential distribution with the exten- 
sion parameter being the frequency of stop codons in the sequence, 
using either the observed stop frequency in the entire sequence or 
the expected value for the local G + C content. 

Identification of coding regions. Statistically significant open 
reading frames were studied by database searches, unless recognized 
to belong to repetitive elements by RepeatMasker (Smit and Green, 
1997). The entire genomic sequence obtained was analyzed using 
FASTY (Pearson et al., 1997) against a database of translated OR 
sequences (Glusman et al., in preparation) as well as by dot-plot to 
representative OR nucleotide sequences. GenScan (Burge and Kar- 
lin. 1997) and fgenes 1.6 (Solovyev and Salamov, 1997) were used to 
build comprehensive gene models within the cluster sequence. For 
each OR coding region identified in the cluster, the prediction success 



was calculated as the fraction of its nucleotides predicted to be 
within a coding exon, in the proper strand. 

Identification of CpC islands. The local concentration of CpG 
dinucleotides was calculated as the contrast value (CV) or ratio 
between observed and expected frequency, as CV = [CpG]/[C][c7|, 
where [Q indicates frequency of C nucleotides, etc. CpG dinucleo- 
tides are underrepresented in the human genome (Karlin et al, 
1998). CpG islands are defined as regions over 200 bp with CpG CV 
> 0.6 and G+C content above 50%. 

Phylogenetic analysis. The conceptually translated OR sequences 
from this cluster were compared to additional human OR sequences, 
chosen to represent Class II families 1-7 from several chromosomal 
locations. Fish and human Class I representatives are added for 
comparison. The human 0-3 adrenergic receptor (HSB3A) was used 
as outgroup. Multiple alignment and neighbor-joining analysis were 
performed using ClustalX (Higgins et al, 1996) with default param- 
eters. Confidence was estimated using 1000 rounds of bootstrapping. 
Phylogenetic trees were drawn using TreeView (Page, 1996). 

Divergence time estimation. This was performed (Glusman et al., 
1996) by comparing nucleotide sequences on which no selection is 
assumed to take place. The estimated substitution level (ESL) was 
calculated using the one-parameter model (Jukes and Cantor, 1969) 
and then translated to million years ago (Mya) as described for the 
i/rrj-globin gene locus (Bailey et al., 1991) with substitution rates 
(expressed as 10~ 9 substitutions/site/year) of: 1,1 for the last 19.2 
Mya (gibbon/human divergence), 1.7 for the period 25.0-19.2 Mya 
(cercopithecoid/hominoid divergence), 1.9 for the period 34.2-25.0 
Mya (platyrrhine/catarrhine divergence), 3.5 for the period 55.0- 
34.2 Mya (strepsirhini/haplorhini divergence), and 5.0 before 55 Mya 
(mammalian-wide). These figures reflect the "hominoid slowdown" in 
nucleotide sequence mutation frequencies (Bailey etal., 1991) and do 
not reflect t//Tj-globin-specific evolution rates. 

Gene structure prediction. To predict potential upstream noncod- 
ing exons, the genomic environment of each OR coding region in the 
cluster (except for the 5' truncated OR17-25) was extensively ana- 
lyzed to generate a gene model. The genomic region of each OR gene 
was defined to include up to 15 kb upstream from the start codon and 
5 kb downstream from the termination codon. The relevant genomic 
sequence employed was trimmed for seven OR genes, to avoid over- 
laps: for OR 17-228, 12.5 kb were used (downstream from OR 17-40); 
OR 17-24 and OR 17-40 have a common upstream region in opposite 
orientations, and therefore half (8.9 kb) was taken for each. Simi- 
larly, 11.1 kb were used for OR17-201 and OR17-2. From our anal- 
ysis, the 5' genomic region of OR 17-4 includes at least 12.8 kb; the 
genomic region of OR 17-2 10 was correspondingly trimmed to 10.4 
kb. Several exon prediction programs based on different algorithms 
were used, including GenScan (Burge and Karlin, 1997) (suboptimal 
exon cut-off used: 0.1), GRAIL II (Xu etal., 1994), Genie (Kulp etal, 

1996) , and the programs fgene, fgenes, hexon, and fex from the 
GeneFinder package (Solovyev et al, 1995; Solovyev and Salamov, 

1997) . Potential exons recognized by at least three programs were 
analyzed further. Dot-plot analysis was used to determine the ex- 
tents of the duplicated regions within subfamilies, and only exons 
within such regions were considered as conserved and therefore 
potentially functional. Dot-plot analysis and sequence alignments 
were analyzed by GeneAssist 1.1 from ABI, Perkin-Elmer, 

Prediction of additional gene structure elements. The acceptor 
splicing sites for the predicted coding exons were detected by the 
programs SPL from the GeneFinder package (Solovyev and Salamov, 
1997) and SSPNN (Brunak etal., 1991). Donor splicing sites for the 
potential upstream exons were detected by the SPL program with an 
LDF value of 0.85 used as cut-off between strong and weak sites. 
Polyadenylation signals were detected using POLYAH (Solovyev and 
Salamov, 1997). Potential promoters and corresponding transcrip- 
tion start sites (TSS) were identified using TSSG and TSSW (So- 
lovyev and Salamov, 1997) with minimal score 0.4 and by PPNN 
(Reese et al., 1996) with minimal score 0.8. 
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Control region analysis. To detect any significant similarities 
among potential control regions, the oligonucleotide analysis tool 
(van Helden et al, 1998) from the Yeast Regulatory Tools (van 
Helden et al, in preparation) was used. We implemented also a 
variant that relaxes the requirements on the patterns found, allow- 
ing the detection of similar patterns in addition to identical patterns. 
The sequences were also analyzed using the segment pair overlap 
method implemented in MACAW (Schuler etal. t 1991), as well as the 
Gibbs sampler as implemented with the Yeast Regulatory Tools. The 
location of binding sites for members of the two families of transcrip- 
tion factors NF-1 and O/E were examined by Matlnspector V2.2 
(Quandt et al., 1995) using the TransFac database. Olf-1 and NF-1 
sites were also mapped by the Word Mapper tool of the GESTALT 
Workbench (Glusman and Lancet, in preparation) using the consen- 
sus sequences TCCCNNRRGRR and GCTGGCANNNTGCCAG, re- 
spectively (R represents purines). Potential recombinatorial signal 
sequences (Sakano et al. 1981) were mapped using the consensus 
CACTGTG (NLGGTTTTTGT (where a- is 12 or 23). 

RESULTS AND DISCUSSION 

Complete Sequence of an Olfactory Receptor Gene 
Cluster 

Mapping and sequencing. We have obtained 4 1 2 kb 
of contiguous genomic sequence encompassing the OR 
gene cluster on human chromosome 17. The sequence 
is a composite of 12 cosmid and 2 PAC clones (Fig. 1). 
Additional PAC clones (PI 10 and PI 1 1, see Fig. 1) that 
overlap with cos58 and extend the cluster map at its 
telomeric end have been identified, but PCR analysis 
with OR-specific OR5B/OR3B degenerate primers 
(Beh-Arie etal, 1994) suggested that they were devoid 
of additional OR coding regions. The final size of the 
cluster sequence fits the —400 kb estimated in the 
initial characterization of this cluster (Ben-Arie et al, 
1994). STS marker 506 (D17S126) is present within 
this cluster (in cos73) as originally mapped by PCR 
(Ben-Arie et al, 1994). In addition, STS marker 
D17S1548 (WI-5436) is present at the end of this clus- 
ter (in cos58). D17S1548 is mapped to 48.8 cR from the 
17p telomere or 4.521 Mb according to the UDB map of 
chromosome 17 (Chalifa-Caspi etal, 1997); see http:// 
bioinformatics.weizmann.ac.il/udb. 

Three cosmid contigs have been described (Ben-Arie 
et al, 1994), and their orientation and distances have 
been estimated by free-chromatin FISH. Analysis of 
several PAC clones covering this region enabled us to 
correct the physical map of the cluster, as shown in Fig. 
1. The original improper mapping of several cosmids 
was found to derive from the existence of a large 
genomic duplication, described below. 

An unclonable region. A 2.6-kb fragment in cos- 
mids 17 and 68 (open boxes, Fig. 1) was particularly 
refractory to Ml 3 subcloning and significantly under- 
represented in sequenced shotgun subclones. Closure 
was accomplished by DENS primer walking (Raja et 
al, 1997). Interestingly, this segment posed no shot- 
gun cloning problems upon direct sequencing from a 
partially deleted cosmid clone (R28, Fig. 1). Analysis of 
the resulting sequence shows that the fragment is lo- 
cated between very old MIR (SINE) and Charlie (DNA/ 



MER1 type) repeats, which are —20% divergent from 
their respective consensus sequences. No internal re- 
peats or palindromes were detected in this apparently 
unclonable segment, but it was found to be singularly 
G+C-poor (30% overall, down to 25% in the middle), 
culminating with an A+T low-complexity region. 

The Detected OR Genes 

The genomic sequence in the OR cluster was com- 
pared with a database of OR gene sequences at both 
the nucleotide and the translated amino acid levels. In 
total, 17 OR coding regions were recognized (Table 1), 
confirming the presence of 11 of the OR genes de- 
scribed in the initial report on this cluster (Ben-Arie et 
al, 1994), as well as 6 formerly undetected coding 
regions. Approved nomenclature symbols (Glusman et 
al, in preparation) are listed in Table 1. From se- 
quence analysis only, 6 of the 17 OR coding regions are 
pseudogenes, while the remaining 11 are apparently 
functional. The expression of OR17-93 and OR17-40 
has been previously shown experimentally (Ben-Arie et 
al, 1994; Crowe et al, 1996). We have experimental 
evidence that all the remaining apparently functional 
genes are transcribed (Sosinsky et al, in preparation) 
except for OR 17-6, which may turn out to be a pseudo- 
gene. 

We have previously reported (Glusman et al, 1996) 
the sequence analysis of a cosmid (cos39) covering the 
middle of the cluster, which encodes two genes 
(OR17-40 and OR17-228) and two fused, truncated 
pseudogenes (OR17-24 and OR17-25, Fig. 2a). The 
genomic sequence of cosmids D53 and L53 confirmed 
the existence of OR17-32, an allelic variant of OR17-2 
that differs from it by only 2 bp of 648 bp (Sharon et al, 
in preparation), indicating that the individual from 
whom the cosmid library was created was heterozy- 
gous at this locus. Similarly, the presence of the 
OR17-23 pseudogene was confirmed, but its OR17-90 
variant was not detected in the sequenced cosmids nor 
in population studies (Sharon etal, in preparation). In 
contrast, OR 17-30 occurs as two almost identical but 
disjointed copies in the cluster: the newly detected OR 
gene (hereafter referred to as OR 17-31) appears to be 
the OR gene closest to the telomeric end of the cluster. 

Two additional OR pseudogenes (OR1 7-208 and 
OR17-1) and two additional, apparently functional OR 
genes (OR17-6 and OR17-7) were detected. OR17-208 
has an in-frame stop codon (Fig. 2a) but otherwise is 
apparently intact, suggesting that this is a relatively 
recent mutation in a gene from family 1. Indeed, its 
chimpanzee orthologue lacks this stop codon (Sharon et 
al, 1999). OR 17-1 harbors several alterations that ren- 
der it a pseudogene, including four frameshifting mu- 
tations (Fig. 2a). These four novel OR regions repre- 
sent three new subfamilies within family 1 (Fig. 3). 

Of the 17 OR coding regions in this cluster, 6 (3 
pseudogenes and 3 apparently functional genes) were 
detected only by genomic sequencing. The 1 1 previ- 
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FIG. 1. Physical map of the olfactory receptor gene cluster on human chromosome 17pl3.3. (a) Location and orientation from 5' to 3' of 
the OR coding regions (arrowheads) and the D17S126 and D17S1548 markers. BR8 represents the approximate DNA breakpoint of 
Miller-Dieker syndrome patient BR8 (Ben-Arie etal, 1994). Pseudogenes are indicated by ip and white circles, (b) The sequenced cosmid and 
PAC clones as detailed under Materials and Methods. Thin lines indicate regions not sequenced. The dashed line in cosmid R28 indicates the 
region deleted in this clone. Open boxes in cosmids 17 and 68 indicate the unclonable region, (c) The approximate extents of all additional 
cosmid and PAC clones mapped, (d) The PCR probes used for mapping PACs. 



ously detected OR genes in this cluster have between 
zero and three mismatches in each degenerate PCR 
primer site or up to four mismatches in total (Fig. 2b). 
Three of the previously undetected OR coding regions 
(OR17-1, OR17-25, and OR17-208) have a larger num- 
ber of mismatches. OR 17-31 is almost identical to 
OR17-30 in the coding region (7 differences of 939 bp, 
or 99.3% identical). These results underscore the im- 
portance of genomic sequencing for reaching a defini- 
tive characterization of gene clusters, even when the 
gene families are well studied. 



A recent independent genome-wide sequence survey 
(Rouquier et al, 1998) of human ORs produced six 
partial ORs from chromosome 17 that map to this same 
chromosomal location, although having slight se- 
quence variations. Therefore it is unlikely that addi- 
tional family 1 and 3 OR genes occur in chromosome 
17. However, members of more divergent OR fami- 
lies are present at other loci on chromosome 17, e.g., 
HTPCR16 in 17q21-q22 (Vanderhaeghen etal, 1997). 
Indeed, we found this genomic region (GenBank Acces- 
sion No. AC005962) to include two OR genes belonging 
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Characteristics of the OR Coding Regions in the Olfactory Receptor Gene Cluster on Human 17pl3.3 
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to family 4: HTPCR16 and a new member of its sub- 
family (approved symbols (Glusman et aL, in prepara- 
tion) OR4D1 and OR4D2, respectively; see Fig. 3). 

Analytical Gene Prediction 

The sequence of the OR cluster was subjected to gene 
prediction analysis using the GenScan and fgenes pro- 
grams (Fig. 4), which predicted 33 and 15 genes, re- 
spectively. Table 1 summarizes the success rate for 
each OR coding region (true positives). Overall Gen- 
Scan yielded better predictions than fgenes, recogniz- 
ing 99% of the coding sequence for apparently func- 
tional genes and 51% of that in pseudogenes, versus 49 
and 11% for fgenes, respectively. In two cases anti- 
sense-overlapping ORFs (Merino et aL, 1994) of pseu- 
dogenes were recognized by GenScan: the OR 17-208 
pseudogene, which is interrupted by an in-frame stop 
codon, and the OR 17-2 10 pseudogene, which has two 
frameshifting mutations. The fgenes prediction for the 
OR 17-1 pseudogene (which has several frameshifts; 
see Fig. 2) partially relies on the wrong frame. Gene 
modeling programs that do not use sequence compari- 
son generally are unsuitable for modeling pseudo- 
genes, yet a high proportion of OR genes are pseudo- 
genie (Rouquier et aL, 1998; Sharon et al, 1999). 
These programs are trained to identify multi-coding- 
exon genes, but no introns that interrupt OR coding 
regions have yet been reported, with one single poten- 
tial exception (Walensky et aL, 1998). Since OR regions 
can be recognized easily by protein sequence similar- 
ity, we conclude that the method of choice for detecting 
OR genes in new genomic sequence is to compare it to 



a database of OR sequences using any alignment tool 
able to incorporate frameshifts, such as FASTX or 
FASTY (Pearson et aL, 1997). 

All of the statistically significant ORFs and most of 
the exons predicted represent OR coding regions (or 
their complementary strands), high-scoring segments 
within repetitive sequences (typically fragments of the 
pol-like polypeptide within LI repeats), or very low- 
scoring, short exons, none of which finds homologues 
by blast (not shown). 

A Non-OR Candidate Gene 

A 297-codon-long ORF (nomenclature symbol 
C17orf2) was recognized by GenScan as a single-exon 
gene with a total score of 13.38. A polyadenylation 
signal is present 134 bp downstream from the stop 
codon. C17orf2, located at the telomeric end of the OR 
cluster, has a relatively high G + C content (63.2%) and 
is richer in CpG dinucleotides than expected from its 
nucleotide composition, especially at its 5' end. Even 
though the derived amino acid sequence was analyzed 
by exhaustive database searching, no homologues were 
detected. Borderline similarities to EST hits were not 
improved by clustering, and no particular fold could be 
assigned to the predicted amino acid sequence (not 
shown). The possibility exists that this long ORF has 
no coding content and that its length derives from 
the expected lower number of stop codons in a G+C- 
rich region; the composition-corrected expectation 
value for such an ORF is borderline (0.8). Alterna- 
tively, Cl7orf2 could represent the first member of a 
new gene family, in line with the estimate that approx- 
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FIG. 2. The olfactory receptor coding regions, (a) The extent of each predicted peptide sequence is indicated on top of the locations of the 
seven-transmembrane domains (I to VII). Open circles denote frameshifts; black boxes indicate in-frame internal stop codons. The circled +9 
indicates the location of a 9-amino-acid duplication in OR17-93 (Ben-Arie et ai t 1994), Gene names and subfamily classifications are 
indicated on the left, (b) The sequences recognized by the OR5B and OR3B degenerate primers: for each gene, periods indicate agreement 
with the primer consensus, lowercase letters indicate the nucleotide found at a primer degenerate position, and shaded uppercase letters 
indicate mismatches from the primer consensus. 



imately 50% of all newly detected genes may represent 
novel families (Uberbacher et al., 1996). 

An OR Pseudogene Turned into a CpG Island 

We have discovered a striking example of an OR 
gene whose entire coding region has evolved into an 
apparent CpG island. The OR 17-1 pseudogene has the 
highest G + C content of all those in the cluster (65.43%, 
see Table 1) and includes 93 CpG dinucleotides (9.3% of 
length) while the other OR regions in the cluster have 
8-20 CpGs (1 to 2% of length, Table 1). While this 
pseudogene has high G + C content and many CpG 
dinucleotides, it lacks Spl sites, which prevent re- 
methylation of CpG islands (Brandeis et al, 1994). 

While this OR unit lost its protein coding function, it 
is apparently under new evolutionary constraints. An- 
cient, Class I (fish-like) OR pseudogenes have been 
reported to have adopted noncoding functions (i.e., reg- 



ulatory) as enhancers (Buettner et al, 1998). In addi- 
tion, the human matrix-attachment region (MAR) re- 
ported in GenBank locus HSM0B2 (Nikolaev et al, 
1996) is significantly similar to OR genes, apparently 
being an additional OR pseudogene, this time taking a 
structural role (Gimelbrant and McClintock, 1997). 
This MAR is mapped to 19pl3.2 and is classified as a 
Class II, family 1 OR. We therefore hypothesize that 
OR genes, which are present in the genome in many 
copies, can also adopt new functions, much as observed 
for the pseudogenes of retroposons (von Sternberg et 
al, 1992; Britten, 1994; Hanke et al, 1995). 

OR Clusters Contain CpG Islands 

Compositional analysis of the complete sequence 
shows that this cluster belongs in the G+C-poor L 
isochore (Bernardi, 1993). Four CpG-rich segments 
(circled 1-4 in Fig. 4) were identified in the cluster. 
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FIG. 4. A sequence map of the cluster generated by the GESTALT Workbench, (a) Compositional analyses. CpG contrast values and 
%G + C are displayed as deviations from the regional average; CpG islands are denoted by circled numbers on top; Spl clusters are overlaid 
in blue; green %G+C stretches belong in the L isochore, and blue stretches belong in the H isochore. (b) Gene prediction results (fgenes and 
GenScan). Predicted exons are displayed in blue, with box height indicating exon quality (the scaling is arbitrary but consistent for each 
prediction program); complete gene structures are underlined in blue; predicted promoters and poly(A) signals are indicated in green and red, 
respectively, (c) Location of ORFs colored by statistical significance; brown and blue ORFs indicate E( ) value under 1 and 1E-3 for the cluster 
sequence, black ORFs indicate E( ) value under 1 for the whole genome (3.3E9 bases), (d) Repetitive sequences. AJus are denoted in red, MIRs 
in purple, LINEs in green, other interspersed repeats in brown; box height indicates element youth as percentage identity with the subfamily 
consensus, from 50% (oldest) to 100% (youngest), (e) User annotation. Location, orientation, and intron-exon structure of the OR genes and 
C17orf2; putative control regions are indicated in green; pseudogenes are indicated by open boxes; also shown are locations of the STS 
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FIG. 3. Phylogenetic tree of representative human OR genes of Class II, families 1-7 (shaded). Capital letters on branches denote 
subfamilies. The OR genes from human chromosome 17 cluster are shown in larger font size and are marked with black dots. The human 
)3-3 adrenergic receptor (HSB3A) is used as an outgroup, and several catfish (Ictalurus punctatus) ORs are included as Class I representa- 
tives. The bar indicates 10% amino acid divergence along each branch. 
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The most centromeric CpG island (circled 1) includes 
the complete coding region of the OR 17-1 pseudogene. 
The most telomeric CpG island (circled 4) includes the 
long ORF Cl7orf2. The two remaining CpG islands 
(circles 2 and 3) are derived from recently inserted (<7 
million years ago) SVA retroviral elements (Shen et al, 
1994). CpG islands 2 and 3 have many Spl sites (Bran- 
deis et al, 1994), as indicated by blue peaks over the 
CpG islands in Fig. 4. CpG island 4 has four Spl sites 
within it, and OR17-1 has none. 

Our results indicate that the OR genes in this cluster 
do not have "private" CpG islands at their 5' ends: 
rather, a few CpG islands are present at an average 
frequency of one island per -100 kb. These may be 
regulatory sequences potentially affecting the expres- 
sion of the entire OR cluster or only part of it. We have 
similarly observed one CpG island in the 106-kb par- 
tial sequence of the human chromosome 3 OR gene 
cluster (Brand-Arpon et al., 1999), in the range 22-23 
kb of GenBank entry AF042089. As in the cluster de- 
scribed here, that kilobase-long CpG island is not as- 
sociated with any particular OR gene. 

High Abundance of Repetitive Sequences 

Up to 60% of the cluster sequence is composed of 
repetitive elements of all known types, including 
LINEs (40%), SINEs (9%), LTR elements (6%), and 
DNA transposons (3%). Several instances of repetitive 
elements retroposing into previous repeats were ob- 
served, with up to five levels of repeated insertion into 
the same locus, a structure we have named the 
"genomic matrioshka" (Glusman et al., 1998). 

Thus, the cluster appears to have been highly per- 
missive to repeated invasion by retroposing elements, 
especially those of the LI type, which amount to 38% of 
the cluster sequence. Indeed, it is apparent that the 
cluster has been evolutionarily "breaking up" into sub- 
clusters separated by long LI -rich stretches (e.g., 75- 
135 and 270-335 kb, Fig. 4), sometimes engulfing 
pseudogenes (e.g., OR17-208). Alu repeats (amounting 
to 8% of the sequence) are somewhat clustered in the 
regions surrounding the OR genes. The high propor- 
tion of LI repeats is consistent with this cluster being 
part of a low-GC content L isochore (Bernardi, 1993) 
within a G band (Gardiner, 1995). 

Gene Duplications 

In previous work (Glusman et al, 1996), we de- 
scribed the analysis of the tandem duplication of an 
11-kb-long fragment, mediated by recombination be- 
tween mammalian-wide interspersed repeats (MIRs) 
and estimated to have taken place 90-100 million 
years ago. This recombinatorial event duplicated an 
entire gene structure, producing two apparently func- 
tional copies, which today represent genes from the 
same subfamily (3A). Analysis of the complete cluster 
sequence reveals additional instances of large duplica- 
tions that include entire genes. 



A tandem duplication of subfamily ID genes. The 
telomeric end of the cluster includes an additional ex- 
ample of an ancient tandem duplication, also appar- 
ently mediated by mammalian-wide repeats (of the LI 
family). Figure 5a shows a schematic diagram of the 
genomic region surrounding genes OR17-4 and 
OR 17-31 from subfamily ID. As in the subfamily 3A 
duplication, most of the duplicated sequences have di- 
verged significantly, accepting several retroposon in- 
sertions and suffering various deletions. The two ter- 
minal LI repeats (of subfamilies L1ME3 and LlM3a) 
can be discerned, as well as a hybrid LI repeat between 
the two duplication arms. Two pairs of segments 
within the duplication display significantly higher de- 
grees of similarity. These are the intronless coding 
region and an upstream segment suggested to include 
a noncoding exon, as well as a putative control region 
(see below). Excluding these more conserved segments, 
as well as later retropositions, the estimated substitu- 
tion level from the original sequence is 15-20%, which 
corresponds to 56-65 million years ago. This is consis- 
tent with the older age (80-90 million years) of the 
L1M repeats flanking it. It is remarkable that in both 
tandem duplication events described, the coding region 
resides in the middle of the duplicated segment, while 
the putative control region is located in close proximity 
to its 5' end. The short distance between the retro- 
posons involved in the duplication mechanisms, and 
the gene control elements, suggests their location in a 
structurally more exposed region, potentially yielding 
an implicit mechanism for duplication of complete gene 
structures. 

Sequence expansion by retroposition. An additional, 
similarly aged (58-65 million years old) tandem dupli- 
cation can be discerned, mediated by mammalian-wide 
LI repeats and including the complete OR 17-6 and 
OR17-7 (boxed in Fig. 4). The duplicated sequences 
have also diverged significantly and expanded signifi- 
cantly, from 8-10 to 17-20 kb per duplication arm. 
This sequence expansion is due to repeated retroposon 
invasion. An even stronger sequence expansion can be 
observed following the OR17-4/OR17-31 duplication, 
with the region surrounding OR 17-4 expanding from 
-7 to -30 kb. Most of the added sequence derives from 
LI repeats, which enter both the intron and the inter- 
genic sequence. 

A recent dispersive duplication. The OR cluster re- 
gion also contains the results of a very recent event of 
gene duplication in which 30 kb of sequence containing 
a full OR gene were copied, within a distance of —160 
kb. Comparison of the genomic sequences surrounding 
OR 17-30 and OR 17-31 by GESTALT (Figs. 5a and 5b), 
dot-plot, and identity plot (Fig. 5c) shows the existence 
of two distinct regions of similarity: one covers —24 kb 
of sequence with >99% nucleotide sequence identity, 
and the other covers —8 kb with somewhat lower se- 
quence conservation (95 ± 3%). Both segments include 
a variety of repetitive sequences. The first duplication 
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FIG. 5. Gene duplications in subfamily ID. (a) Genomic region of OR17-4 and OR17-31; partial GESTALT map of genes and repeats as 
in Fig. 4. The old LI repeats postulated to mediate the duplication event are indicated by arrows, (b) Map of the OR30/OR31 gene jumping 
and conversion events. The sequence segment including OR17-30 is reverse-complemented with respect to the absolute cluster orientation. 
Thick horizontal dashed lines indicate the extents of the recombinatorial events. Vertical shaded lines are an aid for visualization, (c) Identity 
plot of the duplicated segments including OR 17-31 and OR17-30. Depiction of the hypothesis of extensive ectopic copying of 30 kb of sequence 
(currently 95% identical) followed by the homogenization of 24 kb (currently 99% identical). 



segment (99% identity) includes OR 17-30 or OR 17-31 
in its entirety (coding region, upstream intron, noncod- 
ing exon, and putative control region). Consistent with 
the high identity level of this region, no additional 
repetitive elements have retroposed into it, following 
the duplication. On the other hand, a young Alu repeat 
(7.1% divergent from AluY consensus, -50 Myr old) 
has retroposed into the second segment of the OR 17-30 
copy but not into its OR 17-31 counterpart. 

To analyze the mechanism leading to this duplica- 
tion, we examined the sequences at the ends of the 
duplicated regions. A short (55 bp), very young LI PAS 
repeat that flanks the OR 17-30 segment at its telo- 
meric end is followed by a 265-bp-long A-hT-rich sim- 
ple-sequence repeat. No short direct repeats flank the 
LI PAS element, even though it is a very recent inser- 
tion. This LI PAS retroposon also is not present at the 
corresponding end of the OR17-31 segment, even 
though its age is comparable with the divergence be- 
tween the older duplicated segments. Retroposons en- 
ter the genome at specific sites, causing staggered, 
double-strand breaks (Jurka, 1997). Up to 8 kb of non- 
homologous ectopic sequence were shown to be copied 
in P-element-induced double-strand gap repair in Dro- 
sophlla (Nassif et ah, 1994). We hypothesize here that 
such a mechanism acted in the primate genome using 
the genomic surroundings of OR 17-31 as template. The 
end result is the duplication of the 30-kb segment, with 
OR17-31 being the original and OR17-30 being the new 



copy. A later homogenization event then might have 
copied the 24-kb sequence including the complete gene, 
yielding the current structure. Since the sequence ho- 
mogenized in this later event is entirely contained in 
the older, larger duplicated segment, the direction of 
transfer cannot be ascertained. 

Cluster Organization and Evolution 

Reconstruction of cluster history. The OR gene clus- 
ter under study has a very complex organization with 
genes lying in both orientations: eight genes from cen- 
tromere to telomere and nine genes from telomere to 
centromere. A very weak correlation (0.45) can be seen 
between the orientation of each OR region and whether 
a gene is apparently functional or is a pseudogene. This 
suggests the absence of a single, directional "locus con- 
trol region" for the entire cluster, as it would dictate a 
preferred orientation for functional genes. 

Representatives of seven gene subfamilies are inter- 
mixed along the cluster. This is in sharp contrast with 
the largely unidirectional organization of many known 
multigene clusters, e.g., homeobox genes (Garcia-Fer- 
nandez and Holland, 1994), /3-like globins (Fritsch et 
aL, 1980), and also ORs on human chromosome 3 
(Brand-Arpon et aL, 1999). This uniform cluster orga- 
nization usually results from repeated tandem dupli- 
cations and may be functionally important. 

The arrangement of the OR genes in the present 
cluster may be minimally explained by a rather com- 
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FIG. 6. A hypothetical reconstruction of the evolutionary history of the cluster, indicating duplication, jumping, and conversion events. 
Gene location, orientation, and pseudogenic status as in Fig. 1. The scale applies only to the map of the extant human cluster. See text for 
details. 



plex series of evolutionary events, including repeated 
tandem duplications, copying of genes to remote loca- 
tions within the cluster ("gene jumping"), repeated con- 
version events, and gene death by point mutation, de- 
letion, and recombination. Based on the family/ 
subfamily classification of the genes in the cluster, and 
on the estimated times for each duplication event, the 
possible evolutionary history of the cluster can be re- 
constructed (Fig. 6). It is likely that an ancestral clus- 
ter, composed of only two oppositely oriented genes of 
family 1, was tandemly duplicated, with each resulting 
gene becoming a subfamily founder. Later, several lo- 
cal duplication events and gene rearrangements most 
likely occurred within the cluster, as evidenced by their 
current high degrees of similarity. 

The presence of members of family 3 within the 
cluster most likely occurred because, as depicted in 
Fig. 6, following the initial duplication of the ancestral 
two-gene cluster, one of the genes (crossed out in Fig. 6) 
was replaced (e.g., by conversion) by an "invading" 
family 3 founder gene. Alternatively, the ancestral 
cluster may have included the founder family 3 gene in 
addition to the family 1 members. The subfamily 1R 
founder may then represent an additional, ancient 
gene rearrangement event. To clarify this, it will be 
necessary to characterize the orthologous cluster in 
more remote vertebrate species (Lapidot etaL, in prep- 



aration). While comparison to a paralogous cluster that 
contains family 3 genes would also be informative, this 
is currently impossible, since only one additional fam- 
ily 3 member has been described in the human genome 
to date (OR5-83, see below). 

Intriguingly, the family 3 "domain" in the middle of 
the cluster is on average more G + C-rich (42.9%) than 
the family 1 domains (40.2 and 40.7% centromeric and 
telomeric to the 3 A region, respectively). Moreover, 
family 3 coding regions are in general more G+C-rich 
than family 1 members (Table 1). These observations 
suggest that the founder family 3 gene derived from a 
different genomic environment characterized by a 
higher G + C content, i.e., an H-type isochore (Bernardi, 
1993). After integration into the L isochore of this 
cluster, the composition of the originally G + C-rich se- 
quence apparently has changed to reflect that of the 
new environment, but the original G+C richness is 
still apparent, especially in the coding regions. 

Several events can be discerned in which a gene is 
duplicated to a remote location along the cluster (i.e., 
gene jumping), which account for most of the intermix- 
ing of subfamilies. Such events could involve gene ret- 
roposition (Brosius, 1999). Some of the duplication 
events involving genes of the same subfamily are me- 
diated by mammalian-wide repetitive elements (MIR 
and L1M). Since these likely occurred over 100 million 
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years ago, this gene cluster was established signifi- 
cantly before the mammalian radiation, potentially at 
the amphibian stage. The ancestral mammalian clus- 
ter is therefore predicted to have included 8-13 genes 
(Fig. 6). Since then, the genes in this cluster apparently 
have undergone little further amplification by repeated 
tandem duplication, in the primate lineage. 

On the other hand, there is evidence for recent ge- 
netic exchange with other genomic loci, with several 
instances of genes from this cluster being copied into 
other chromosomes. The genomic clone Gl (Selbie et 
al. t 1992) is almost identical to OR 17-4, but its genomic 
location is unknown. OR1 1-13 {OR1D7P, GenBank Ac- 
cession No. AF065866) and OR 11-22 {OR1D6P, Gen- 
Bank Accession No. AF065868), on chromosome 11 
(Buettner et al, 1998), are >99% identical in nucleo- 
tide sequence to OR17-23. OR13-66 (U86222) on chro- 
mosome 13 (Rouquier et al, 1998) is identical to 
OR17-2. Strikingly, OR5-83 (U86272) and OR5-85 
(U86274) on chromosome 5 (Rouquier etal, 1998) also 
are almost identical to OR17-201 and OR17-2, respec- 
tively, suggesting that at least 30 kb of chromosome 17 
sequence, including genes from two different families 
(3 and 1), were duplicated into chromosome 5. It is 
therefore apparent that duplications of single OR 
genes between different chromosomes are not uncom- 



mon, without necessitating concomitant duplication of 
extensive genomic regions (Trask et al. t 1998). 

The Conserved Structure of OR Genes 

To predict the intron-exon structure of the genes in 
the cluster, including potential upstream noncoding 
exons, we analyzed the genomic environment of each 
OR coding region, concentrating on the features con- 
served between ORs that belong to the same subfamily. 
One to four upstream, noncoding exons were predicted 
for each OR gene in the cluster (Fig. 7). Exons con- 
tained within repetitive sequences were eliminated 
from the analyzed set, since we aimed to recognize 
similarity due to exon conservation, rather than due to 
similarity between repeats. The sequences of the pre- 
dicted exons for each OR gene were aligned with the 
upstream genomic regions of all other ORs from the 
same subfamily, to recognize potential upstream exons 
conserved between ORs that belong to the same sub- 
family. In general, the noncoding exons predicted for 
genes of a given subfamily displayed sequence similar- 
ity, but the upstream exons of genes from different 
subfamilies or different families were much more di- 
vergent. 
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Subfamily 3A. The previously predicted upstream 
exon for OR17-40 and OR17-228 (Glusman etaL, 1996) 
also was present in OR 17-201. An additional short 
exon was predicted for OR17-201 and OR17-228, in 
close proximity to the coding exon. 

Subfamily ID. An upstream exon was predicted for 
the four ORs from subfamily ID (ORl 7-4, OR17-23, 
OR17-30, and OR17-31). The potential upstream exon 
for pseudogene OR 17-23 was identified only by fex and 
GenScan. 

Subfamily IE. A single upstream exon was pre- 
dicted for ORl 7-93. Its counterpart upstream of 
ORl 7-2 was predicted only by fex. It is worth noting 
that this predicted exon is located within a very old L2 
repeat, which apparently was present before the gene 
duplication leading to OR 17-2 and OR 17-93. Part of 
this ancient repeat may have adopted a structural 
function as an upstream, noncoding exon. Regions of 
similarity between these two genomic sequences that 
contain no predicted exons are shown as open boxes in 
Fig. 7. The genomic sequence surrounding the ORl 7- 
210 pseudogene does not show any significant similar- 
ity with the upstream sequences of ORl 7-2 or ORl 7- 
93. 

Subfamily 1A. One of the three upstream exons 
predicted for OR 17-7 shows sequence similarity with 
the upstream exon predicted for ORl 7-6 (recognized 
only by Grail). 

Subfamilies 1R, IP, and 1C. OR 17-1 and OR 17-209 
have three predicted upstream exons, while one up- 
stream exon is predicted for OR 17-208 (Fig. 7). Since 
full genomic sequences of additional ORs from subfam- 
ilies 1R, IP, and 1G are unknown, further subfamily 
comparative analysis for OR 17-1, OR 17-208, and 
ORl 7-209 could not be performed. 

Prediction of splicing sites. The two splice site pre- 
diction programs (SPL and SSPNN) complemented the 
exon prediction programs as they can detect potential 
cryptic or suboptimal sites. Splice acceptor sites were 
found to be localized 6-471 bp upstream to the start 
codon of all analyzed ORs (Fig. 7). The coding exons of 
two genes (ORl 7-4 and ORl 7-1) had weak acceptor 
sites with scores less than 0.70 according to SPL and 
less than 0.80 according to SSPNN. Acceptor sites also 
were predicted for the putative internal upstream ex- 
ons of OR17-201, OR17-228, OR17-1, OR17-208, and 
OR17-209. 

Donor splice sites for upstream exons were detected 
by the SPL program (score for weak sites less than 
0.85). Interestingly, "donor doublets" were predicted 
for the upstream exons of all subfamily ID ORs 
(OR17-4, OR17-23, OR17-30, and OR17-31). The ob- 
served donor doublet consensus sequence is GCAG- 
mACrGAgCAsTGGGTAGGGTsyGkmyrbCTCAGsCy, 
where the boldface, underlined GTs five nucleotides 
apart represent the alternative splicing donors, and 
capitalized bases are conserved in the four sequences 



studied. This suggests that alternative splicing occurs 
in this subfamily. 

Prediction of polyadenylation signals. A POL YAH 
predicted (Solovyev and Salamov, 1997) polyadenyla- 
tion site occurs 3' to the coding region for each OR gene 
in the cluster, indicating 3'-UTRs of 200 to 1500 bp 
(Fig. 7). 

Prediction of transcription start sites. Potential 
promoters and corresponding transcription start sites 
(TSSs) were predicted by TSSG and TSSW (Solovyev 
and Salamov, 1997) and by PPNN (Reese etaL, 1996). 
The TSSs predicted by both PPNN and by either TSSG 
or TSSW are marked as asterisks in Fig. 7. In addition, 
the very highly scoring TSS predicted for one of the 
upstream exons of OR 17-209 is indicated by a double 
asterisk in Fig. 7. The predicted promoters are all 
TATA-less, like the promoters of other olfactory-spe- 
cific genes (Wang et al., 1993) and as suggested by the 
preliminary analysis (Glusman et al, 1996). Initiator 
(Inr) sequences (Javahery et al, 1994) are present in 
the upstream regions of ORs from subfamilies 3A (ex- 
cluding the OR17-24 pseudogene), and ID, as well as 
ORl 7-2 and ORl 7-209 (Fig. 7). The predicted Inr sites 
do not coincide with the promoters predicted by TSSG 
and TSSW but are located within 800 bp of suitable 
splice donor sites. 

Potential Transcriptional Control Signals 

The availability of the complete sequence of the clus- 
ter provided us with the first opportunity to compare 
the upstream genomic regions for the OR genes that 
are clustered and that might be expected to share 
common control features. A dot-plot and ClustalX 
alignment comparison of the 15 kb upstream from each 
of 16 OR ORFs in the cluster (no upstream sequences 
are available for ORl 7-25) showed significantly con- 
served segments within subfamilies, but no extensive 
sequence conservation of upstream regions either be- 
tween subfamilies or between families. It is apparent, 
therefore, that genes belonging to different subfamilies 
have diverged significantly in their upstream regions. 

No recombinatorial signal sequences. It can be hy- 
pothesized that the clonal exclusion of ORs is at least 
partially based on somatic recombination, which would 
then join an OR gene to a putative locus control region. 
Somatic recombination joins gene segments in immu- 
noglobulin heavy-chain genes via recombinatorial sig- 
nal sequences, or RSSs (Sakano et ah, 1981). The 
WordMapper tool was used to detect such signals in 
this cluster. No suitable RSSs were found in the 
genomic environments upstream of the OR coding re- 
gions. 

Detection of a specific CT tract. A global comparison 
of all 16 sequences found, as expected, significantly 
shared patterns among members of each subfamily. 
Seven of the 16 genes were therefore selected as rep- 
resentatives of the different subfamilies for further 
analysis (OR17-7, OR17-31, OR17-2, OR17-209, OR17- 
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FIG. 8. Multiple alignment of the CT-tract sequences, listing gene name, pattern position relative to the ATG codon, orientation (V for 
complementary strand), and actual sequence. In the consensus line, uppercase indicates consensus by plurality of 85 or 90% for unambiguous 
and ambiguous bases, respectively. Y denotes pyrimidines; M denotes A or C; W denotes A or T. Shaded bases indicate matches to the 
consensus in positions where the consensus is unambiguous. 



208, OR17-1, and OR17-40, for subfamilies 1A, ID, IE, 
1G, IP, 1R, and 3A, respectively). The genomic envi- 
ronments of the 7 selected genes were examined with 
the oligonucleotide analysis tool (van Helden et al, 
1998) from the Yeast Regulatory Tools (van Helden et 
al, in preparation), using pattern length of 8. As ex- 
pected, the highest scores (representing patterns 
present in most of the sequences) corresponded to pat- 
terns that are part of AJu repeats. Ignoring these, the 
highest scoring patterns were seen for pyrimidine:pu- 
rine (Y:R) tracts, the CA repeat, and CpG-containing 
patterns. A similar analysis on sequences in which 
interspersed repeats were premasked again gave high- 
est scores to Y:R tracts. Similar results were obtained 
by analyzing both strands simultaneously (not shown). 

When the segment pair overlap method of MACAW 
(Schuler et al, 1991) was used to detect longer con- 
served sequences, a pyrimidine-rich segment (hereaf- 
ter named CT tract) with consensus CTTYTCCCTYTT- 
NTCTCY was found. Using the Word Mapper tool of 
the GESTALT Workbench, the positions with signifi- 
cant similarity to this consensus were detected and 
mapped (Figs. 7 and 8) and also could be correlated 
with the predicted splice donor sites in a subfamily- 
specific fashion. Specifically, the CT tract is contained 
within the putative control region conserved in genes of 
subfamily 3A (Glusman et al, 1996), as well as in the 
noncoding conserved sequences of subfamily ID. 

To study the generality of these findings, the Gibbs 
sampler method was used through the Web interface of 
the Yeast Regulatory Tools (van Helden et al. f in prepa- 
ration). Using patterns of various lengths, but especially 
^30, only Y:R tracts that comap with the CT-tract motif 
detected using MACAW were detected. The sequences 
surrounding the CT tracts are enriched in C+T beyond 
the specific consensus sequence described (Fig. 8). 



Therefore, the most significant pattern common to 
most potential control regions, beyond trivial similari- 
ties deriving from either historical conservation (be- 
tween genes in one subfamily) or sequence repetition 
(of retroposons) , was the presence of pyrimidine:purine 
tracts, which are located near splice donors. The CT 
tracts could in principle be an olfactory-specific recom- 
binatorial signal. On the other hand, their location 3' 
to the putative upstream exons weakens this possibil- 
ity, as such exons and splicing signals would be miss- 
ing from the selected gene. 

Pyrimidine:purine tracts have been shown to pro- 
mote unwinding of the double-helix (Bucher and Yagil, 
1991) and to be implicated in regulation of transcrip- 
tion and in posttranslational regulation (Valcarcel and 
Gebauer, 1997). Within the observed tracts, a specific 
motif (CT tract) could be defined as consensus, suggest- 
ing the conservation of specific patterns for transcrip- 
tion factor binding. 

Mapping of transcription factor binding sites. Two 
families of transcription factors are expressed in the 
neurons of the olfactory epithelium: the O/E family, 
including Olf-1, Olf-2, and Olf-3 (Wang et al, 1997); 
and the NF-1 family (Baumeister et al., 1999). Olf-1 
and NF-1 binding has been demonstrated for pro- 
moters of the olfactory-specific genes: OMP, type III 
adenylyl cyclase, and olfactory cyclic nucleotide gated 
channel. For G olfa , only Olf-1 binding has been shown 
(Wang et al., 1993; Baumeister et al, 1999). Several 
potential binding sites for O/E and NF-1 transcription 
factors now have been identified in the genomic sur- 
roundings (up to 15 kb upstream) of each OR gene in 
the cluster (Fig. 7). 

Using Matlnspector, one or two Olf-1 sites were 
found with scores above 0.850 for most of the analyzed 
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ORs (Fig. 7). Predicted Olf-1 sites for OR17-40, OR17- 
30, and OR17-31 have scores from 0.820 to 0.835. Ad- 
ditional Olf-1 sites were observed upstream of OR 17- 
24, OR17-2, OR17-93, OR17-210, OR17-6, OR17-7, 
OR17-209, OR17-208, and OR17-1 when mapping the 
Olf-1 consensus using the Word Mapper tool. Gener- 
ally, the Olf-1 sites for these ORs were predicted with 
a lower score than Olf-1 sites for other olfactory neu- 
ron-specific genes. This most likely is because the pre- 
viously described Olf-1 sites were located in the rat 
genomic sequences, with the sole exception of the hu- 
man OMP Olf-1 site (Buiakova etaL, 1994). Therefore, 
a low score for human predicted Olf-1 sites might re- 
flect interspecies differences. In addition, Olf-1 f Olf-2, 
and Olf-3 bind to Olf-1 sites of olfactory neuron-specific 
genes with different affinity (Wang etaL, 1997). Thus, 
it is likely that different members of the O/E transcrip- 
tion factor family bind in vivo to OR Olf-1 and to other 
Olf-1 sites. Except for the OR17-23 pseudogene, all 
analyzed gene upstream regions were found to contain 
strong potential NF-1 binding sites, but no conserved 
patterns for NF-1 localization could be distinguished 
within the analyzed OR subfamilies. 

CONCLUSIONS 

A large human genomic region including a cluster of 
17 genes of the OR superfamily has now been fully 
sequenced and characterized. The only potential 
non-OR gene identified was at the telomeric margin, 
suggesting that this uninterrupted cluster evolved by 
repeated expansion. The inferred primordial cluster, 
suggested to have been established in an early amphib- 
ian ancestor, presumably included only a few OR 
genes, which gave rise to the two different gene fami- 
lies observed in the extant sequence. The cluster has 
not evolved by simple tandem multiplication of its ini- 
tial components, but has apparently grown in complex- 
ity by several recombinatorial events, some of which 
are relatively recent. For some of the recombinations, a 
mechanism may be discerned, involving interspersed 
repeats (retroposons). The interspersed repeats repre- 
sent 60% of the sequence in the cluster and belong 
mainly to the LINE family of retroposons, though 
SINEs and DNA transposons are also present. The 
intergenic distances vary significantly (5-67 kb) and 
are related to the amount of inserted repetitive se- 
quences. At this stage, it is unclear whether repetitive 
sequences within the OR genes affect their expression 
to any extent. 

Significantly, the observed recombinatorial events 
involve complete genes, suggesting an evolutionary 
mechanism for preserving intact gene structures upon 
duplication. The common gene structure has been de- 
lineated by computational analysis of the OR genes. 
This was found to include an intronless terminal cod- 
ing exon, terminated by a signal for polyadenylation 



(0.15-1.5 kb downstream from the stop codon) and 
preceded by introns (0.5-1 1 kb long) and by one or two 
short, noncoding upstream exons. The resulting com- 
mon gene structure is consistent with that which we 
previously described for OR genes belonging to family 
3, with the addition of the possible existence of more 
than one upstream noncoding exon for each gene. The 
functional role of this stereotyped structure is still 
unknown. The upstream noncoding exons might play a 
role in the control of mRNA fate or subcellular local- 
ization. 

When the complete genes in this OR cluster are 
compared, several levels of conservation may be dis- 
cerned. Within each subfamily, the coding sequences 
are most conserved, the putative control regions and 
noncoding upstream exons show an intermediate level 
of conservation, while the introns and the intergenic 
sequences are the least conserved. Between subfami- 
lies, the overall intron-exon structure of the genes is 
more conserved than the specific location and the qual- 
ity of the relevant splice signals, while the putative 
control sequences are the most divergent, with only 
their pyrimidine:purine tracts and Olf-1 transcription 
factor binding sites conserved. The Olf-1 transcription 
factor binding sites may play an important role in 
olfactory-specific transcription of the OR genes, while 
the pyrimidine: purine tracts, previously shown to pro- 
mote melting of the double-helix for transcription ini- 
tiation, may serve an auxilliary control function. 

A sizable fraction (6 of 17) of the coding regions in the 
cluster are pseudogenes. One of these (OR 17-1) appar- 
ently has shifted function to become a CpG island. 
Other examples are known where OR genes adopted 
new, noncoding functions, e.g., promoters and matrix 
attachment regions. This appears to indicate that OR 
coding regions have a special plasticity, allowing them 
to evolve new functionalities. A potential explanation 
of this versatility, as well as the prevalence of pseudo- 
genes, may reside in the variability within the OR 
superfamily and the partial functional redundancy of 
OR genes. 

Clustering of the OR genes may play an important 
role for initiation of their transcription by common 
enhancers. Each of the identified OR genes appears to 
have its own, independent TATA-less promoter region. 
This finding and the apparent lack of recombinatorial 
signal sequences suggest the importance of trans-act- 
ing factors for regulating the excluded cellular expres- 
sion of single OR genes in the cluster, rather than a 
somatic DNA rearrangement mechanism. The cluster 
includes CpG islands, potentially affecting OR gene 
expression. Two of the observed CpG islands derive 
from recently inserted SVA retroviral elements, pre- 
sumably absent from the genomes of New World mon- 
keys and other mammals. Further work will be re- 
quired to ascertain the functional role of these 
potential regulatory signals. 
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*7t. m 2 m iz ^ z ?i z>p&w.mm (in «. < ^^gtticntw tct, 

(1) A^T-^^I (phy tosiderophore) W^tfc. (2) *g @ ^ (D A ^ * & £M <D fe 

tb, c 3 ) & t i±if*mm£ <d pim&m&fc<DWj$.. (4) ««jflci:«k5A^^ 

^£I-££*i£-f£<D®:iK, <h ^ *o fj: -d (Takagi 1 976. Takagi et al. 1 984) „ 

< ***4t«!ifc £ tc^- & c o ct -5 tnWiWLmm (in k: b p 

— S^^^O^fJl'.!: LTa^n^iS (Saccharomyces cerevisiae) it 
fflrfiaLfcfRiJXfcfiS (I) tC^^Ufc»cKJR&fT-5. ffi »*i«3 K & It -5 it & =?■ V ^ 

T <Z) (iff Fe (II) <£> h ^ >7stf-d? — *t|*#a>tfc!lS:JK£Rtfc<7) □ > 7° U 

/ >f--> a >i:ioT^n-- >^$ftT^?.(Eide et al.. 1 996)^', £ fz'M 
t^^^Ii^CD is X 7=" A iZ ~D UTteSS^ e> tlT l^TScti. 

ttlliaLTlf (Saccharomyces cerevisiae) T <7) tS AS # fli? IC S¥ L < 
l^btlTl^. II S d i3 £ t£ PiS i|X . £T*PIJiaffiSS®T = fiHi»cil7cBS& F R 
E 1 . FRE 2 tC=kOH{ffii£tf}-{3fi$c^<Dit7E£fT? (Dane is et al. . 1990. 19 
92. Georgatsou and A 1 exandrak i. 1 994) . & Jt £ tl fz — {35 ft * Iffl US 1*1 Ux «9 & tT 
#©<i: LTjaSSafP-ltttlfll (high affinity) £ . {£ M *n 14 $S #f (low affinity) (D 
2 o<7D(RiK«flia<feS. 

SSM^nltttlflltCfcSttiaiRtt, -?)ls^mm<tBm (multicopper oxidase) FE 
T 3 £ — iiffift ©B^ft (Askwi th et al. 1994)£?T^fc&. S b < tt = fffi ft © 
h7>X*-^-FTR 1 (St carman et al.. 199 6) 1r^fflBai*l(C?XOiiSn^><i: 
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^Xbnt^S. Z. Z.T* F E T 3 (C=t & Zlfifli fi* K> ff B£ ft fC . 

<h ^ o T *5 0 (Dane i s et al. . 1994. Klomp el al. . 1997). F E T 3 iZ M £ 
£ ifcl2& \Z ~D T m b tit ^ 5 (Yuan el al.. 1 995. Lin et al.. 199 

7) 0 

4 (Dix et al.. 1 994,1 997) CDffift^flctS&CDTcbSt^AbtlTI/^-So 

T t> £ W T $ -5 ffi ft £ i'J 3ft -T * - t. />< "J ft£ £: ft 5 . 

#36i?H#£te» ^(OSWT. Dr. Dancis (NIH) J:Oii$§(tfcii«FRE 
lififsf*. 3? A':n^#AL7i^K$E^£nfc^ A3 SrtJSSLfc (ilin. 1 995) . 
L^L. FRE liefroiASntiliHi^A'^Ttt, ^ CD il tc *S 14 « W 

± m \z it ^ f£ ft ^ * b n ft ^ -3 . / - if > a -r ^ 'jy^t'-yax^iss, @# 

M CD it f£ f F R E 1 ^ A zi IMS <Z> *£ ^ j£ ft *< 0 . 9 k b t /h $ ft & CD f- ft o 

dcoi; o ftg«£ft<Z>ifief SES^ffift ^39 A liot LH£?^^iy£o 
fzMtiLT. rt?-)V7sffi (Bacillus t hu r i ng i ens i s ) CD =r )V 5> - X > K h * -> > 
( 6 -endotoxin (2£&ttCD^ > A° 9 K) ) S3 - KtSlfif HC r y*<fe5. 
4 2 J^±CD C r y ft L . i:SiJt I. i 4 ^©^7 7 7 (cry I~cry IV) {C 3t 

It b *l£ (Whi teley and Schnep f . 1 986) „ Z. CD £ t£ ^ > /I £ R £ n - K t~ S ift(5 
f #Ei3S*iftK:#X£nfc#, -5 * < 56 51 Lft^ o 0 , m%&\Zf&$L&W&*>T 
^fz z\ t. rflofr t5 fc. 

d CD JEE <h L T . (1) 3 F >*'Jffl *P (cod on usage) coil . (2) Cryjfi 
fifC A T ^g#*S5 (3) mRNAiDT^ttt. (4) C r y it f CO — SB 

#W > hd>iLTX77 / f y/^Snil^, 3? # * b ft T ^ -S . 

c r y i£? uzmmmw-ii'vbmMizftmz -its /tfen. wfttw^^tsssB^j 

^ ft ^' ft C r ylfef^i L, 7"7<-7-£^)$LTPCRi:J;0^i&fi)ct 

* co^jaacoiti ^r^g^ii £f£is l/c zitfefBfifrsnT^s 

(Perlak et al., 1991, Fujimoto et al., 1 993. Nayak et al., 1997)., 

z. co J: o ic. ffi 3? «§ ft \z ftli co 3e ft co it e f ft a & m A L- T W n U & T 5 d <h tt 
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itu IB L 7c <fc -5 M * (7) CD #t % A b n T 7c . 

n fc i$i m m % ( c & ^ t . a m a it e ? a* -t # i -z % ^ -r -a 7~ * o g is £ $& * w 5e 

ltt/ct:5, )gRfi|»$n5««©inRNA<0 5|?U (A) (3 m 1£ T -2> S 

^ <t> & m & £ 11 t & *> <d t & & . 
% <p na ^ 

m R N A W # U ( A ) HQ \Z !S& %k T 3 S ^ tf> M J® £ » mRNAOJp'J ( A ) ft JP 

it5Sfti:rif5. RtlEL/zmRN Affil U ( A ) ft JJD iC M % T Z> S £ <D & Js£ 
O iM. m BE ?'J d= L T fi . AATAAAiO^iE^J^ifSK. £ 7c . S 8* m R N 
A<D#'J (A) #JJP(Cia«-rSS3R<0««^. G T - U -y ■^fctSSBEJ'J ©Til 

i* . jgRftiftsnsf f > *ij ffl ^ t- s ^ r fT n 5 ^ t a< s? s l ^ . 

<£> BB & zi h* > 60 _t m. IZ . n U-* -y i7 IE yiJ ( K o z a k IE ?'J ) StT5:tA<ifJL 

Its. #se9i <dm91&& $ n/c*rffl«4fctt. ifltTgoti, ifTft^Tfc 
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cfc < <td ^ & k -d ^ t tt m«j m. « & ^ . 
$ e id. # seism. gtf8eo^jtte«i-r * $ nt§ <5i££E5»j7&tafcg $ 

* ^ bj? a> ^ <& m. s @e #i a > m n e & $ n w ffl m m \z & ^ t ft m m -v % m t 

( A ) f>flJP(CRI«-r4SjRT*0. S IS # U ( A ) flPK "T 3 S 5?t <D g|$ # £ 
ftU « % S IB ?'J T S & $ n T ^ * d <*: £ # & £: -T 3 *> <73 T & 0 . $ b (C it . #AS 

n § it e ^ <7) % m <n g &. t turn m & m # & 'p & ^ z. t. . SA^nsise^©^ 

*'J (KozakSE^J) $tt§ d <h # if * L ^ . 

y ^ y ^ > h^^^^it^^t^^mtr^mm^mmm^mm-r^yj&izmt- 



83 2 Bite. *i4fca>8c#<0i!8:JK«fl& (ID £ ^ 1 1> <7) T * -5 . 

m 3 ia . is; 3? is 4& k *s n 5 # u ( a ) ftjocfii^^tfcicT*^, 

m 4 HI fi . HS^ilEf F R E 1 (75 G & T U >y ^ & gB #| £ ^ "T t> <7) T ' & -5 . 
I51tt, r e f r e 1 W^fiitS^t^S^t fc<DT^I>. 

6 Hlte . r e f r e l«5^fi)t(-ffifflL.fc3 0^(75 "7" — <7) SB ?'J £ f t> 

7 HHi . r e f r e l<BBa3aj£»^Lfc77<V— £<DM$&£^Tfc<AT& 

' m 8 EH£ . r e f r e 1 ffl^S©MJS$itt)fflTS5 ( 
fg 9 111 fi . fx It L 7c r e f r e 1 W £ IE 9*1 £ ^ "T t> CD T & £ . 
^ 1 0 ®IJ« FRE1 ( ± © ) t r e f r e 1 WGatfTOttlSIfiJt 4 
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mi 2aii. *iio^itei$nfci^©4t £*-r?iT^?,. 

Il 4 la tt . r e f r e 1 £ zf □ - :/ t L fc W te&ft: CD > M -f 7" U y •< 
•tr — ->3>«)|g***-rfc©T?fe2>. ifcflHiE c oR I tH i nd I I I 

«fc -2> ?g ft CO il T ' & 0 . SlliH i n d I I I T CO ?8 fC «fc £ CO T & Z> . N 
o. 1— No. 1 2 lijg 0 . W. T. IF £ M T ifc <5 . 

mi 5 13 . r e f r e l£7a — 7<tLfcJBMteB!fr©/— +fWW:/U^ 
-Y-if— -> a > CO JH £ ^ fcOTfc*. No. lR^No. 2 W:fl£SCte&ft:&3R 
W. T. ttt^iTiS. 

mi 6 1tt, 4SK::fettSHflBlfcig5n&3fc<E>Jg'l4£. Fe (II) t;<k§BPDS 
-Fe (11) WL&toam t^fi^St?lTS S o £IJ(if4lTS0, * fg 

« £ n & (,> # . £ tij cd m n $5 & ft: co $ t tt # tr> % a* ai m -c # s . 

^1 7@B« BiJcDf I$Ki<$&ffl^T?g 1 6|g|i:|pIi;ilSI*«OMLfca^<75 

|gl 8 1IJ. M « $5 & ft: CO $E ^ b n tilt 2 ttt R g * ffl t> fc *g \Z *S S = fiffi 3£ 
il7cP^co?S14 * , BPDS-Fe ( 1 1 ) IS ft: co # ^ % fe tc J: 0 ^ L fz W- M T 
BW&mfocD 2 fltf^S Cfcffi'J) i:t>BPDS-Fe (II) «^#:l:J:S# 

*%bjco#a$ n-g)fi!icoja{E^<h utt. -ti^^^uTWffl^feco-e^) 
«s ^ tc *t l t m m m iz m & u t> co -v & t *> «t ^ . i& ^ 51J & £ co # -r 3 

v> m& ft Htfff 0»JAI£, 0*coB^ijX{cP^^-T-g)SI#coHfiE^jl7C^^ F R 
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E 1 ft L V>. 

t^IitlT, mRNACO^U (A) in £ fli £ cm T <^ £ & S @E ?'J rt* £> -5 

d t £JLtti L tzo £6fC. d<D#U'(A) CD fafin £ £ C^t T ^ £ & 3 SB ?'J <7) _L 
fiftfC G T - U y f^Ilie^J^^lTS 5 d <h * JLfctl L fc. ^.SD, G T - 'J 7 

^&ifiSffi3Fy#*5£. i«-ctt#u ( a ) 0 an # yt s £ n . ^©^t?t<5 

# U ( A ) fUtf. AATAAAfilOlIEJlJCl 0^30bS5(?) 

® Fir T m R N A £ 9] Br L- . # 'J ( A ) # U * 7 - tf <D ®) # T' # U ( A ) ^ ft Jjn T 
5©T»5. Lfc^'oT, m A $ flSlfif^ d <D «fc o ftt£SI&?'J £ W L T 5 

AATAAAi©iIffi?iJ^tf^)*U (A) is 5? i~ JU CD 1 0 ~ 3 0 b f£ 
^TmRNA^IRSnsitca*. 

btA^x> ^^HjtiSA^n^jte^^^^ffi-rs^i^f'^ w^>^ u (a) 

-> ^ ^" ;U . AATAAAi»£lffi 5»J » if SL<ttGT-'J -y ^ ft *g X 

7 5 7 gg SB £'J 15 JE L ft ^ ? lc £> Z\ ft 5 <D ifift £ L ^ . 

#U (A) S' ^ :f- <7D tfi g BE ?U <7> & & <Z) * & bf, G T - ij 7?^H1 
IB?'J£|i&< d <h fcfiF £ L^o Sf:. ifiSSCGT- 'J -yfft«SE?tJ^#ffiLT^ 
S«-&fc:tt. -?-05TSlEflJtmm-r*# U (A) i/^t^«©lfiaE5>JC«lsltTm 

* fg w cd & «t n t£ . mr ia w & & icsn^T^A^nsite^o^ M 
fejc iz to tz 1 t £ a <£> g & is c (o w fi o) m w >p ft < ft ^ j; o \z & ^ -r z> <d a< s 
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LO. £fz. mR N AO^&feitM&i £ L T $B €> ft T H § , ATTTAE?i]Jf 

m R N A ^' i $ <t < a ER $ n 5 /: <D E ?i] i IT a b T ^ 5 □ f 7 ^ E ^1 ( K 

t a< t # „ #j a «\ # -f > h s i — 9- — -> a > m m m m \z £ o ^ si? l t & s 
ts c t & a <d&m (Dum-en -s - £ t- # & „ 

e^^fi^Ji^T'^oTt), z. n e, £ ^ < r> & o 7 v * > h \z ft m l t . p c 
a< t #r & c t « a m m \z. « e> a* t * & . 

<&otu5 ( as 0 . 9 kb) z: <t a* e> . -(Di^(ii^#^bfcc mmo 3 m&m. 

/ill IS £ LI, 

( 1 ) mRNA©-8RA<< > h □ > t LT« 0 HlSnTlfii. 

(2) SSIRpI*i (coding region) roi4>TK?/i^t)oT^-i). 
tt^2 ^ K> IJfflEI* £ ft/S.. £&fc5##r£(T-5fc«>RT-PCR£fTlr*. 

F R E 1 ^ritfe^SA U7t^M$E^^/\*3T-«iaiRMfcl (coding region) 

T-#u (a) oftjjp^e z. r> t ^ -2> z. t tfmw L fc. 
^buz: s-e»9<z)jief siBi^ffl^d^x u%^*«?tig$ nt^5« t l 

T<>^;U^ — if (Hincha. 1996) ^ 0 . F R E 1 ISftDiAC 
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F R E 1 L &fl25*$E&* A' □ T^fiCDm R NA««Tta ^Ift tt. F 

RE 1 (Dliit (coding region) CDi^T^U (A) -ft" JD L T t ft T S 

# U (A) ifi ft M T £ S (poly (A) site) fj — 6- P/r T? fi 75: < , ^ tl ^ tl CD ztf 
'J ( A ) -It -f h CD ± Wi C fi til *MC S IS m ifi A A U A A A t£ CD # U ( A ) -> ^ 
;U£i§3£ StlfcffiHa (putative poly(A) signal) a*#£Ljfc. L^b, A A U A 
A AliOSSJiJ £ (,> 5 CD te> F R E 1 co 5 ' M K: ^ < o rt^ftT £ dtlbCDil 
ffi-eti^u (A) cofi-jqteiB z: o x ma ^ d t *<t>^ o fc. 

4jg4$9 (3 is I^T # U (A) <BttSn£i*^'CmT(^ # 'J (a) ->y-^;ucD 

$ b (~ ± 8iE fC & ?i T & . G U - 'J -y ^ 75: @S ?iJ CO Tj X \% ts. ^ £ # x. e> tl „ OS 
0 G U - 'J >y & JFfl # & £ tffi^lTtt* U (A) CO ft JD 3£ £ *l . ^ CD & T 
T< 5AAUAAAiOie?iJ(7) 1 0-3 0 bi5© rp y Aj TmRNA£9JBf 
L . # "J ( A ) tJ? U / 7 — if CO ill € T jJ? U ( A ) -T -5 CD 7c -5 , 

*s^, u (a) <z> an ass nira t & o 7c g u - »j «v ^u^Lmtumm 

X\%, # 'J (A) OftHl$i*]|Lt^5^tAt. FRE 1 SilAbfc^Ife^^ 
A3T^£gOmRNA#T#&^MHT*S<!:#*.e>*lfc. 

Hfffi^il7Cl#^FRE 1CDGT-U -y ^ T & -5 £ S =b n £ @E ?'J £ 31 4 III ^ T . 
m 4 El CO m ft X m o 7c pI t£ 75* G T - U t#A&n5fl(«T*5. 

fiEoT. ^A3THflfiail7C**FRElS:5eig$-B-Sfc«e)Ctt, FRElI 

ef ogt-ij 7f ^se^^ittttf^^i^x.bn^. l a* l . m a £ t cd t z. 

3 m Vi CD # U ( A ) ftttlS:j*5£^tt*iE^Jf*'^^^Ttt3>-fc>-9-^^GU- | J-y 

L&3>-fc>-trX7Ffra>6»&^UA±* ^CDK?U £ fc ttT"te5££:g com R N 

A £ t# e> ft & ^ *J & # * -5 . ifel:*^T^iS©mRNA7)^)3)c$n5ct^ 
»C . FRE l(7)7 5;iffi?il$f At IC> JBRiE«iSnstt«3<Z)ZlK>fiJffl^ 
(codon usage) \Z. $> t> i+ t&S BE ?'J £ H$ It f S - <*: fz L . 
tlT, # 36 9! # €> tt » ^ =1 T' & M CO H fiffi ft M 5c ^ * ^ 31 $ a I- . 

9 



WO 99/48356 PCT/JP99/01481 

F R E 1 <DT ^ J ^ IS ?ij £ A T Id ? A' =l <D =3 K >mmm (codon usage) IZ & -o 

fzi&mMzvuzmm n ltz. femMmnmEf iz % tz ^ t -a g v> & m l tz « 

(1) GT- U •yf^f^i^fc. 

(2) # «J (A) y^t^t^tlSSlE^AATAAAiiitffnicKfe 

( 3 ) i& s §2 #i £ m is l ^ -r ^ -5 (c, 4 o o b p t tc m m. s# m £ 

~D< 0 , 5-3(7)-tl^^>h{3^ttfc (4 17~436bp) . 
(4) m R N A©^^£ftiS5»J <b ^t>tl2>mm&fi\ A T T T AiH^'J (Ohme- 
Takagi. 1993) £ft < Lfc. 

( 5 ) ± m m (c t> -3 x m. s g si c o # ffi \z m & ^ «t ? tc n k ><d & b 
£ a n m x tz . 

(6) K o z a k SS?'J tffHtlSfi^ittT', mRN AMI<t<»S!$n 

3 #>.<75gB?iJ (Kozak, 1989) * Pjfj £6 n K ><£> fft (C tt iff tz . 

z\cd izLTt%at-zrifzwn<DE.m$km.7iMmF re i (Dm.fc=?<D?km-£ti 
rz m. m ss ¥\ % . sa ?<j * <75 @a ?<j s # 1 ic ^ -r „ * £ . E?ij#§2Cf<D7 5/ iiiH 

Skffr $ *l ifi e ^ <Z> £ iffi . PlfitSFREl (UtT. reconstructed FRE1 
OBgT r r ef relj t^o , ) tLfz. 

$%ltf)re f reltt, %Z 5 ffllZjik-f S m<D-t if X > b ( A ~ E ) C^ttt 

^ fijt * n & o 

-fc^*>hAte, 4 3 4 b pSTWfetDT. lSlg^bEcoRI, 7 t& g 
B S X b a I. 4 2 9& XB*>SBamHia)rifl|lg||3glt<h&&o«fc-5fca 
MiZntzo -ttf * > b Bit. 4 2 9 ~ 8 4 5 b pit(Dfe(DT\ 4 2 9 ^S@^ 
£ B amH I . 8 4 O^ffigfrbM r o I CD $ij |5g p +r < h £ i# ^> J; o 8£ St £ 
n fc . t^^>hCli. 8 4 0 ~ 1 2 7 5 bpSTffflt)OT. 8 4 0SlB^b 
Mr o I , 1 2 7 0tfigBrt>£Sa 1 I © flj KB » +J- < K £ J# o <fc o t:: f£ ff $ n 

. -tz ^ ^ > h D . 1 2 7 0 ~ 1 6 9 6 bpsT«fcff)T, 1 2 7 OIIB^ 
S S a 1 I , 16 9 1 t£S B />> S P s t I <D^iJpgg¥^it< h $^^<t^ iCHilt $ 
-fe^*>hEte, 1 6 9 1~ 2 0 9 2 b pSTWfcWT, 169 lSlg 
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*bPs t I, 2 0 8 ltllg^bS a c I , 2 0 8 7fiSIA^H i n d I I 

i <d M mm m -y- -r h&ft-ojzoizisc^zntz. 

4 1 7~ 4 3 6 bp©^t^//hA~EH tnf^l 7 7 ~ 8 3 m e r « 7" 

-7 - £ in 6 in (3 7* -r = * & . - n s ±g s ga * @a ?o & cd sa #1 s # 5 ~ 3 4 iz ^ 

-7 4. - 5, -6tt7>^-fe>^^T&5. &t^^ > h07"7< 7 

3 <h - 4 # . 3' 1 3 b p^lill. 77^7 1 <h - 2 . - 2 

£ - 3 . - 4 <h - 5 , - 5 t - 6*<f nfn3' *iS(:i 2AU 3 b(D*-K- 

7 7 7"^t)Oj;5l:7"7'f T-^i^Lfc. S , 7"7^7 lt-6li5' 

5fc yffi CD l tfi S g b $M fig m gR{£ * t> o cfc o \z m. •& L Jt . 

3 ©PSB C3 P C R roSJSJffiS: 0 . 8 % 7 # □ - X y ;U T 16Si*S& L 7t t& - f S 
£ tl £ fi £ (4 17~436bp) 0/O K 0 UJ LTflU L . 7"77 5 K 
pT7Blue (R) ^9 $ - ( T a K a R a ®i ) ^^D-;>7'U;„ f btl 
tz 9 a — > <D £ & IS £'J £ HE 83 t , i£ ft DNAsequencer D S Q - 1 
OOOL (&i*Si> £*iJffl UTIE L ^tfiSia^J^ *><Z)£j!ifc Ufco 

8 SHc ;S f 7? ffi ^ J; «9 r e f r e lCD^&^fEj&LTio 

-fe if * > hBRD^Ett, -f > It — h CD fa £ S <D f£ /£ tz i& \Z & g T ; & o . 
ft!i CD -fe ^ * > H:^^Ttt< > tt" — h <D [fi] £ m & < IE L n *fi S @S ?'J £ "a" A. T 

ibtifc r e f r e 1 CD £ £ S ffi ?>J £ 35 9 HI ^ f . r e f r e 1 CD ffi ?'J cD # ft 

(1) 7CCDFREl<i:7 5. 3 % CO ^ ^: D — # & S . 
( 7 ~ y @£ 0a ?'J tt 1 0 0 % ) 

1 1 
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( 2 ) 8 ffi g « ± igfil L G X T <B * <Z> g£ ?ij £ & £ £ ^ . 

(3) AATAAAtlr>^E5iJt*tfC&< . CI CD CO E CO 1 £ g £ {Qi (7) Jgg 
T S t& L £ IE ¥\ & ^ £ & ^ = 

(4) ATTTA ge?'J£^£&^. 

(5) iiai:^fcoTGCtllii^S:^ t 

^ ^ if b n . 

5c CO F R E 1 <h J± ^ T . r e f r e 1 Tttl^ L /; G &t;'T £D^©ffi?iJ^So 
T^oId:$:^l OiCSt. dtlta. FRE 1 is «fc r e f r e 1 CO IE £U ^ CO 

L 7i 8 K CO G T ^ ft £ % T ^ L CO T' & £ . 1 OEIK^LJtcfc^fc. 
r e f r e 1 T > G T ©tf I^'tcO F R E 1 (3 J.t^TlS| — fb$ni^5^t 

d © J; ^ I; It^filc L/iKeI 1 r e f r e 1 «: ^ /\' 3 (Nicot iana Tabacum 
L. var. SRI) 'sif ALt. 0 K Is ^ CO £ m *7 ± V < -> > W 14 £ fig ft W 6 8 til ft: 
m±l;fZo S4LTtfcil3i:gWii£f TSS r e f r e l««9A*nTK5 
^co n f-t^Itgt^/c^lI^V 5 -y ^7 • +Mf > • /W 7* U if — 
-> 3 > $ fr o it . ^ co *£ m 0 K $c & ft: T 1 # 6 3& n tf - co m ft: t~ *s ^ T r e f r 
e 1 m.fR=F cO#3E^ffi£I££ tlfz. 

S-) (#%Xitt (4) ) (C IB ©<7> DTfr O - £#T# -So 

«fc 0 ^ ft: W fC tt . SJE©^i£T?pT7B 1 u e (R) * 9 & — \Z {7 u — - > # 

Lfcrefrel CO $1] IS # JSt XbaliSac 1 i: <D 7 ? # * > b & * A*-f :MJ 

— pBI121 (TOYOBOM)co^-glucronidase 

CO ORFt^ffitCcfcOxt^L, /HtU-^^^-pRF Uff)$Lt s A' 

-ft'J^^-pRF 1 CO #1 i£ £ fg 1 10i:St. 

= K p R K 2 0 1 3 (helper plasmid pRK2013) 5 3 7 

Hftil^SLfc. — 7^A^r'J^A7^77yI/XC58 (Ag 
robacterium tumefaciens C58) £ . iI9J £ in £ ft W £ ^ tr L B ?i£ft:i§ 1 m L ^ 
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2 6x:~c 2mmwi.%mvtzo cms*, tft^ft 1 o o u l & «t o iK'M^m. & ■ 

£ £ & H L B 7" U- - h ± \Z & £ L- , 26tt2!t«8Lfcft. S & * A ^ a. 7 T 
7> - h^^ttOl^^l — h (\00ug/uL rifampicin (Rf) LlSug/ul k 
anamycin (Km) $ttfLB7>- M l:t, 2 6 tT 2 »gi LTy >^J^ □ 

y/^'J^D--^4mL©LB (Km, R f )«£{£ ig flfe f „ 2 6 °C IT 2 lift $g 
& Jg * L » 7J^'J-SDSS (a 1 ka 1 ine-SDSfe) T, 7"77SH 
<fe ffl ttS L - mi fig B$ % (3 «fc 3 Hfr A° * - > * * £ d £: -C , P R F 1 <Z> # ft * fit IS L, tz . 

i±i(D^;\D W 8 c mBif<Z)g^s& 2 ~ OlO . <&3Effi3R 

110%, Tween20 0. 1 % ) TifcLfcy t - HCAft, 1 5 # IB1 flfc 
j$L/<C^f)iilfc. ^Oft. SSStK-C 3 [Hl«0Tofc^. ^£8 mm ft * X 
-CSJoJt. v ^ — U {ZSI&T*. (^7c^)t ^ > LB (Km, R f ) mfc^ ±l!i 4 1 . 2 
etTZP^giLfc/Ht'J-^^^-p R F 1 ^feOTi'O/^f U 7A7^ 
77->I>XC 5 8 (Agrobacterium tumefaciens CSS) CDigijR $ 3 m L &3]P A 

7c. i »f /u v v «-^» < M£fx o Rfc# , $ e> tc*— h ^ u-y 

^ it £ . m s m toi ^ > x ;u t x — > & ^ 9 v > P % £ Jp a fc ig tm ± b 
€r . 2 stTWsu^ffTaaiffljgSLfc. « & . ^ m- * m sa <z> m s m to» ic $ 

ibiZ??y*7> (CLAFORAN) SrJjP^.fcJgil!lt^Ll®ffl^«Lfc^. mJfEtf)^ 

^ 7 * ^ > £ jp z. tz m t& k £ ■=> # > & jo x tz m s m ut k s l- , 2 iara 

rttcffiA^^fco 7j juxtfmmz n. -> ^ - 1- (shoot) nzc <=> 

**TM5ta->:x— h (shoot) £ ^ 9 ¥X 9 . M S tg tit fC # "7 -f -> > £ SP A 

fc m tiii c ^ l tz . 

-> — h (shoot) ^ 6 *B # til 7c: t> <D * A — 5 jl 7 -T h £ ffl rt> . a 

13 
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<^ 3 & <£> £ SIS 1 3E]{C¥J^<hLT^-r, 
fzo f (DiS^H l 4li:it. 

(^iSfr) (2) ) {CftoTfif^. Ilbftityy ADNA€> M 

lllEc oR IRLKH i nd I I I £ t> "fe H T ilHt L . r e f r e 1 CD £ g $r 
)t £ M t L T ^ fi£ L 7" P - y' (fcfzL [ a - 3 3 P ] - d A T P £ ^ fc ) £ 

^ i 4i ici^ny; 5 v v*r*f > • a-t -7 u y-r -tr— •> 3 >-eteftiiji5gi?3t«i 

II <7) B# T D N A fi £ ffiij A . ffifl Eif 3fc ffl S & x * y — ffi £ fx o fc fc «> K: 

# A 2 n*: ite^O zi t° - £Sc £ £ Bj* LTtr>*fc>tt-ett&lr>. 

ftfjfllfl^ E c o R I Rtf H i n d I I I ^ it ffi it T tif ^ T CDfll T M 
£ n £ , 3. 2 k b p0i:t$(:A'> K^II^nt. L L , No. l 2©1 
{* ^ b « 3 . 2 k b p «fc 0 T \Z /Jn $ ^ A > K =b tft tb 2 n it = d CD IS * i~ «fc 
tU£ > No. 1 <h N o . 1 llZlJlnt'-, No. 2 Id « 3 □ t° — 7^ 4 Zl hf — . 
No. 9 i:(J4 3 t"-CO, r e f r e l^SLtl^t^Abtl^. 

tt 43 . No. 1 2WH i n d I I I iCct^iftttyj^COD - K57©t*A' 
> K tfilfc f±5 $ n & o , 

BUfficD^ y ^ -y £ +Mf > • Af 7 'J 5>W if — -> 3 > fi? «f v> * ^ b . ZZ Til 
/u7c 5 mftiZZ) HTfi. r e f r e 1 Jftfi^ «? A $ tl T ^ -5 d <h t> -3 7c . 

SfgiHEc oR IR^H i nd I I ITWOffltt. "7" P ^ — ^ — S ^ — 
S^-^iTA^OUSSniCt. lALtr e f r e lS<Rf(iCaMV3 5 

s y p — ^ — coaiTTmRNAA.tgf $n?>^ti:ai) 0 no. i 2 <de 

c o R I & tfH i n d I I I ©if TfWf ftl:*^T. 3. 2kbpiO : bt)T 
tc /h $ ^ A > H *> ft Hi $ tl cd ( J , is b < @ A $ tl tz n > X h 7 £ b V> o ■<=> 

co-o^i^yy Aizia^jAjn^ici^T^nTLSu. flu&y y acd e c 
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oRI^Hi n d I I I CD ifi < fc A o fz <D tz 5 o t % A £ il 3 . 

& (£ » ntj ffi (7) y / 5 v £ It it" > • A < "7 U -fcf - -> a > T r e f re lfg 
=?-<r>M A^fflilga titzfizmut& £ nfe ^ /X n CD N o . ltNo. 2i:^^T, £ 
g © m R N A ^' T' § T ^ I <h ^ / - f > fir l: d; 0 1 1 b t . 

c <z>A¥#t iz 4s it & , 7*D7t-f>^«^ftKii, (0lx.tt\ r/7o — 

y-^7I>Xj (MftXfbtt) (1) ) C^oTfT^, AO"'jy< 

y - tt* > a < ? -f a - -> 3 ><d *s m * m 1 5 m k ^ -r . si situ, sn? 

£M (W. T. ) <7) v— y \z. ttA > K tttfttti $ =n&7&> -3 fc. No. 1 t N o . 2 
©l/->l:tt, 2 . 5 k b a> * # £ * v * — & a* > K #tfe Hi 3 n , n =fc 0 /h 

$ & & m \z ^ <-Dft<D/\> h ifi tfntizntco 

y — If > JS¥ tfrK: «fc 9 , r e f r e 1 WiSAi: J; 0%±S<Om R N 

^e>©JftttS{C«toT(Tofc. 7a^-^-CCaMV3 5SSffl^t:ti^i 
4*J fls £ T r e f r e lHHS^j^fg^LTtr* li^©iRNA^a 
HI L 7t ituI5C0fi?#T =fc o T t>. ffi%ft:OTit , ^:e>^-l?fcffiT'b|p|i;«l{Zi|£¥^fffe 
ft, po 1 y (A) <DttmiiLm , b&.t> Ztz^Z. £Hbffi$&-Vgti. 

J - +f > A -f y 'J # <i if - v 3 > CD & m ^ b . r e f r e 1 £ # A L tz 9 A" u 
Ttt2. 5 k b(Ote?i«J^flSg$nfciOT. poly (A) CDttfiPtt. NOS 
<7)^-5*-^-{Ci;^T&C^T^££#A6ft-g>. r e f r e 1 £ @ A L 7i 

^1 5it;^bn5. 2 . 5 k b«k0t)/h3a:/O Ktt. ftftgCffilK L t¥ 
K £: it $£ "T <5> <h *> <k -5 if r RN AWfilC^lH^tlT^/:. r RNA^coya — 
:7 a> # *|# & B<J (R # £ A 6nfci». i4i»R NAi:ttA< yj y-C XL 
T V* & ^ £ t. £ , r e f r e 1 <D mWMtii £ A 7" U XLTl^Ii tt R8 
iS H ft H o r e f r e lTt4fi«t 0 t)g^inRNA*<hf ^ft^6ttTli4 
<t ^ r> pJ ftgtt tt & -5 . l^lfcii: r RNAilslU^t^Cttil^nfcli^b 
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# A £ £: , R N A CO ®j R# (- . r e f r e 10mRNA^'^:ffi(C#ffi-r-S> r RN 
Ate 31 $T t>nrc(DT"teU^fr z_ btlS. poly (A) +RNAt'(t^M 
ilty-f yA^^'J^^t'-yay^fr^CiT'. d CO g * H b ^ (c "T 3 
Z\ ch a< T # -5 , ^1 5 0i^t)i»a:<t5l:*SI!^^iS©mRNAT*ofc 
cOT^coflffi £ 0 

£ 7t . z CD j — +f > a -f 7* ij y < -t* - x a ><D fc. «e> (z . -9- +f > y\ -r y U y < if 

- -> 3 > co «g m . nf-T«^fcNo. ltf>Jfi?«te&#£. 3 X li 4 □ t° - T 
fc-^TtNo. 2 <7>JBJCIEtfe#<D R N A** Eft L . r e f r e »3 f 

— ScfC?J£oT. No. 2©^*tA> K^ffl&^lciS^C t*<t)*ot. 

$ 6 ic f#e»nfcig«*sJji$nfc^/tD < tt -r -r -> > t- ii & ) 6 8ift®o 

il7CjSt£<BflSI8K:te. Fe (II) C0&^&^ U — ^— T*<5. A V 7 x. 7" > V 
n "J > >? x ;U /ft ( balhophenan thro 1 ine disulfonic acid (BPDS) ) # 

Fe (II) tl^*S:i^t^CtT'*fe^Mt§Ct$fi|fflLfc t 

izmfr-ttT )i 5 *-r;i/ Tit yt lt 2 7 tt 2 4 0#fhi#b l fc. BRte&w<»mm 

Z\(D£o\Z. mX"(DE.m&m.7tBmfeVt<Dmmizm ^Tz 6 mfe (kanamycin & 

so «-r^THffii^it>cStt^^tti $ nit. tut? sin ^m^mo? 

a*. 1 n#Rfi <" 5 t^a* 6 co it ^\zm$g-z ttfc. ^SHcffi^fcj&Rtean*: 6 fla#-r 

Z. <L fzff . fe&UmmW < 4Ii:tt^TI^l 

tlTLiott^«ilR|*«i6ofc. mil, # A L/c r e f r e lit{£^COalT'CO 
ff&ifi. m b GO * # — X A T" 61 L T ^ & CO T tt J& ^ ^ t }f M £ n & „ CL CO 7i 
& 3; 0 m 53 g& ttfr *5 & frr> tz tf* C a M V 3 5 S 7 u t — <? — CO ffl ft CO T T 
r e f r e 1 it fg ^ II *5 ^ T t> & ¥ • $ tlf&m L T ^ S ZL t S # ^ 
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c © j; 9 i: . *fgsj#b«. mm (D = m&m.7imm f r e i % 
* a- n t- % m t 5 c <h a< t # s i»t ? a* =j * &>j §g l jt . 

mRNA»-§WH> hD>tLT7 7"7^->>^$nTl^ pJfi£14 £ . 

% (coding region) CDi^T^U (a) # In L T 5 «h t> 5 2 O <7D T>I fig ft # # 

A b tl -5 „ 

#56 WW:. K a £ 4fc <Z) it * Si *i & a» A IT, ^£ficD$£^jg'.$j#q# b 

fcl (coding region) (O&'FT, # U (A) <D ft tia & M ft 2> 1t sb \z , 8i&M£X±m 
,fcL7c G X ttT <^^CO@a?iJ $f. A A T A A A iz ^ -5 SE ?'J fc T & < . Z\ 
O^o^coi^S^flliCO^ST'S^LfcSa^J ( BP . NATAAA, ANTAA 
A, A A N A A A . AATNAA, AATANA, X tt , AATAAN) 
J; "5 {Ctftffl-'TSi&S^&S d £tfP\W Lfc. 
S 7c . G R tit C CD ^ £ # £ fgg m :b 7c o T — 3t Id ft 5 «fc e> (C IS It "T 5 - 1 1> s 

£ b (C, ffir IE <D fg 0J CD H ft g# 0J \Z & ^ T . yDt-^-tLTCaMV 

3 5 s ^ffl ^fcoTiRtsi^nfc^/t^TU, m i wfa'£fc-eE.ffi$km.7tmmtf 

it ^ T nj t ^ & o 

ii««i:fifawfc'^. -r * f 4 -c # ; + # i 3 # & -r s « & tt « = ffi 0; £ - ffi & 

ClTcLTiiKbTt^ro-eii/i^^i^Abtix^l). 0£ ^ 2. <d m f 4# H M {' 
l<7'nt - ^ — fz. *%B«cr)H{ffi^jl7cillgjte?r e f r e Uofi<*:t 
lc«to, tt^5L^frT"ettOflftiR«flS (I) i©JR«flf (II) tzwtmz it z> z. 
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(1) 9 D — - > V £ i/-t> X>X (1989) mnSCitit 

(2) mty><n?cRmmyn h zi- )i (1995) 5^*± 

(3) n^J-mm <7XhKf7h' •l£f«tfrfl)Ilil (1 995) 

(4) 7^-7 — a. T;HH«Jifie^ ©«fiE«¥«f (1 992) 

(5) "7 X ^ < X <2> , -b)U, 7 6 1, 4 0 3 - 4 1 0 1 ( 1 9 9 4 $) 

(Askwith. C. . et a 1 . . Cell. 76:403-410 (1994) ) 

(6) X 7 >y- > <=> , V-r^ +^I>X, 9 1 1, 1 2 7 - 1 3 2 1 ( 1 9 6 
1 ) 

(Brown. J. C. . et al.. Soil Sci., 91: 127-1 32 (I960) 

(7) f ^, 7?>h7<{iS*ni? — , 5 01, 2 0 8 - 2 1 3 1 (1 
9 7 2$) (Chaney, R. L. , et al.. Plant Physiol. 50: 208-213 (1972)) 

(8) y > i/ X £ . tU + a^-tJl-yWtny-. 1 0 1, 2 2 94-2 3 0 
II ( 1 9 9 0 ^) (Dane is. A. , et al. , Mol. Cel 1. Biol. 1 0: 22 94-23 

01 (1990)) 

( 9 ) y>x7b, 7"ay-f-f>y*7'ty 3 t^7Afi7^1t^I>7 
USA, 89t, 3 8 6 9 - 3 8 7 3 1 ( 1 9 9 2 ^) (Dane is. A. . 
et al., Proc. Natl. Acad. Sci. USA 89: 3869-3873 (1 992)) 

(10) =f 4 v 9 X <=> , i/ \ — tJ^y/WtDy*Jl/t$Xh'J - . 2 6 9 1, 

2 6 0 9 2 - 2 6 0 9 9 1 ( 1 9 9 4$) (Dix, D. R. . et al. , J. Biol. 
Chem. 269: 26092-26099 (1994)) 

( 1 1 ) f 4 -y C X ■=> , y + -t^t7*/HtOy*;^57x h'J-. 2 7 2 |. 
1 1770-1 1 7 7 7 1 (1997$) (Dix. D. . etal.. J. Biol. 
Chem. 272: 11770-11777 (1997)) 

(12) X -f ^ , 7"Dy-T-f>y'*7*ty3t^7ijf5'Vi'1i-'fX>X 
USA, 93#, 5 6 2 4 - 5 6 2 8 1 ( 1 9 9 6 ^) (Eide, D. . etal.. 
Proc. Natl. Acad. Sci. USA 93: 5624-5628 (1996)) 

(13) 7y't hb, /H*/f7yOy-, 1 1 # , 1151-11551 
( 1 9 9 3 $) (Fu j imo to, H. , etal., Bi o/Techno 1 ogy 11: 1151-1155 

(1993)) 
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(14) #ij-b, y^>h±)l, 9 # . 6 6 7 - 6 7 3 1 ( 1 9 9 7 ^) 
(Gallic, D. R. , et al.. Plani Cell 9: 667-673 (1997)) 

(15) ya-A'hVb, t U+a 5-t JP/H * o y- , 14|, 3065- 
3 0 7 3 1 ( 1 9 9 4 (Georgatsou, E. el al. . Mol. Cell. Biol. 
14: 3065-3073 (1994)) 

(16) y :*-!=> , /Hty-5^;ni''ri>7, 211, 477-4811(19 

9 6f) (Guo. Z. et al., Biochera. Sci. 21: 4 77-481 ( 1 996)) 

(17) At7 hb, ■»-t;i't^/H*Oy*Jl'^57 hij-, 2 7 0 |, 
1 2 8 - 1 3 4 1 ( 1 9 9 5 ^) (Hassel I. R. et al. . J. Biol. Chem. 
270: 128-134 (1995)) 

(18) 6> , yt-t^t^y^y h~a - h 'J ya >, 7 # , 66 7- 
6 7 6 1 ( 1 9 8 4 f) (Hether, N. H. . et al.. J. Plant Nutr. 7: 6 6 7- 
676 (1984)) 

(19) yt-tJl'*y7"7>h7'( y^Oy-, 14 7S, 6 0 
4-6 101( 1 9 9 6 ^) (Hincha. D. K. . ctal., J. Plant Physiol. 
147: 604-610 (1996)) 

(20) U +f «y ^ . >? v — ^- Jl y -t ;l/ A -f * □ v — . 1 0 81, 2 2 9 - 2 4 
1 M ( 1 9 8 9 ¥) (Kozak. M. , J. Cel 1. Biol. 1 08: 2 2 9-241 (1989) ) 

(2 1) U>b. ^ — tJl/t7"/H*Dy*Jl/^S7; h U - , 2 7 2 1, 9 2 

1 5 - 9 2 2 OI ( 1 9 9 7 ^) (Lin, S. J. , eta].. J. Biol. Chem. 27 

2: 9215-9220 (1997)) 
(2 2) v — -> i — b . y*t-t^t777>hia-hUy3>, 9 # . 

6 9 5 - 7 1 3 1 ( 1 9 8 6 ^) (Marschner, H. , et a 1 . . J. Plant Nutr. 

9:695-713 (1986)) 

(23) *?XXb> TW^t-. 3 8 7 1, 7 - 8 1 ( 1 9 9 7 ?) 
(Mewes. H. W. , et al.. Nature 387: 7-8 (1 997)) 

(24) t'Jb, r«_BiJc:*3^S«a^lS^§iat<7>Sfeft^J . 1 2 2 5 - 2 4 9 
1 ( 1 9 9 4 ^) (Mori. S. (1994) Biochemistry of metal m i c ronu t r i en 
ts in the rizosphere. (Eds. Manthey. J. A. . Crowley. D. E. , Luster. D. 
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G. . Lewis Publishers) pp. 225-249) 
(2 5) h-b, /^/htU + a^-ZH^Dy-. 1 1 # . 109-1 

2 4 1 ( 1 9 8 8^) (Naito. S. . et al. . Plant mo 1. Biol. 11:109- 124 
(1988)) 

(26) t*;->b, ^7>ht^7^>**Dy-, 34|, 401-4101 
( 1 9 9 3 ^) (Nakanishi, H. . et al. . Planl Cel 1 Physiol. 34: 401-4 

10 (1993)) 

(27) -T-^-y^b. 7D/-f-{>^*7"ty3tf 7*f5'V^1'I>X 
USA. 941, 2111-21161 (I 9 9 7 f) (Nayak. P. . 

et al.. Proc. Natl. Acad. Sci. USA 94: 2 111 -2 1 1 6 (1997)) 
(2 8) / - ^ # 3r b . 7" D y-f^ > ^'^ 7" t y a t Jl/ 7 A f i 7 ^7 t 

X>7x USA. 901, 1 1 8 1 1 - 1 1 8 1 5 1 ( 1 9 9 3 <p) ( Ohrae 

- Takagi, M. , et al., Proc. Natl. Acad. Sci. USA 90: 11811-11815 
(1993)) 

(2 9) ^ 2 A? <b , ^ — ± ~7' 7 ? > h - a.— MJ -> 3 > . 1 5#, 

215 7 - 2 1 7 2 1 ( 1 9 9 2 <P) (Okuraura. N. , et al. . J. Plant 
Nutr. 1 5: 2 1 57-2 1 72 (1992)) 
(3 0) *H7b, 7 , 7>h : &l/ + a7-/HtDy- > 2 5t, 7 0 5- 7 
1 9 1 ( 1 9 9 4 ^) (Okuraura. N. . et Al. . Plant Mol. Biol. 25: 705- 
719 (1994)) 

(3 1) tJl't>&, yt-t)^^7"7>h-a-h'J->3>, 2 # . 6 29 

- 6 6 0 1; ( 1 9 8 0 ¥) (01 sen, R. A. et al. . J. Plant Nutr. 2: 6 29- 
66 0 (1980)) 

(3 2) A°-^-y/7b. 7"Oy-f^ yy'^T't y a tJI/7*f ^ 7 i'fl' Xy 
7s USA. 8 8 #. 3324-3328M (199 1^) (Perlak. F. J. . 
et al.. Proc. Natl. Acad. Sci. USA 88: 3324-3328 (1991)) 

(33) 7f^77>b, 2 7 1 1, 1 5 5 2 - 1 5 5 7 1 ( 1 9 
9 6^) (Stearman. R. , ct al.. Science 271: 15 52-1557 ( 1 996)) 

(34) ? # ^ . 7-f J^<X>77> F7"7> h-a-hU-> 3 >, 2 2 # . 
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423-433M ( 1 9 7 6 (Takagi, S. . Soi 1 Sci. Plant Nutr. 22 

: 423-43 3 (1976)) 
(3 5) 9 tl^h . yt-tJ^777>h-a-hUya>, 7 # . 6 2 9 - 

6 6 0 1 ( 1 9 8 4 ^) (Takagi, S. , et a I. . J. Plant Nutr. 7: 469-47 

7 (1984)) 

(3 6) *7-f hl^-b. y^TJU/t'i-Y-f^OAWtny-, 401, 5 
4 9 - 5 7 6 1 ( 1 9 8 6 f ) (Whi teley, H. R. . et al. , Annu. Rev. Mic 
robiol. 40: 549-576 (1986)) 
(37) — £ , ^'7>hv :j r — ±)V. 8 # , 3 2 3 - 3 2 9 1 ( 1 9 9 5 ¥ ) 

(Wu, L. , et al.. Plant J. 8:323-329 (1995)) 
(3 8) -V— >b. 7"Oy-f >y*7*ti'3t;i'7*T5 5/^t'fI>Z 
USA, 9 2 1, 2 6 3 2 - 2 6 3 6 1 ( 1 9 9 5 ^) (Yuan, D. S. . el 
al.. Proc.Natl. Acad. Sci. USA 92: 2632-2636 ( 1 995)) 

fc ps^ $ n -5 *> CD t it ^ o 

Xj (UttlSCftfr) »C$i£n, l£f «tfiSBE3»J0)«¥tlf (3 DNASIS (Hitachi^) 

SSJEflsj 1 ( F R E 1 A L fcJgK4ESI^ An ^ S R N AOtttB) 

FRE 1 £ # A L ^ fll ^ A 3 6 <Z) ± R N A <D fcfcWi < N a i t o e 

t a 1 . 1 9 8 8 ) ICtlEoTfT^fc. 

FRE 1 £ # A L fc fl? & ^ A* 3 CD it 2 g £ ?L tC A ?K # S 3!i £ flD A T 

^icto^^Lfc. fiW^i t~ 3 fg a cd &i ft ffl @ tff % t m m <d y x / - ;u / ? u 

nfr)lA (1 : 1 ) £r ?JP 8 0 0 0 r pmtl 5»l»DLf;f . zk 11 £ 

nozMUAttbtB^llIlfT^fc. - 8 0 "C T 3 0 ft- X ? y — JUfctjK L . 8 0 0 0 
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rpm. 4tT3 0 »I6 La® S: 7 0 % / -^TJt^i. «EEft*SLfc. 
£tMx£ lmLODEP CzkHjgA* 13500 rpmT3 L T ± m £ fr 

L^f a- 7" tC ^ L . 1 / 4 v o 1. ©1 0M L i C 1 £ JP T tK ± T 2 f B 1 
#B Lfc. I2000rpm. 4tT10 frit^ L . fctK£ 7 0 % / 
ifc ?* & . «H 3£ JSS L it *> «> £ D E P C * 5 0 iz L II ffif L . 

kiss jamffinffift 

1 M Tris • HC1 pH9. 0 
IX SDS 



HlJ5£0iJ 2 (tfij (A) +RNA©llt c D N A^-pEl) 

^ItWltjlfcllRNAWl OOMg^S. ^^tt'-XtUJ ( D y n a b 
eads Ol igo) ( d T ) 25 (DYNALS) £ f i] ffl U T > # U (A) 
+ RNA4«»Lfc. £ CD # U (A) + RNA^TE CD A < ^ U '7 ^7^7 
— Sr ffl ^ T . 3 7tTll 13 M-MLVUA- X h 7 >X i7 U (M- 
MLV reverse transcriptase) (TOYOBOl) C 

a^U -v F7"7<7- ( d T , t y y y ^ — 7 5 -i v — ) 
5 ' — GACTCGAGTCGACATCGATTTTTTTTTTTTTT 
T T T - 3 ' 



nmm 3 (RT-pcRtnsE5!i(D*g) 

nmm 2 x-m e> ntz c d n a sj t lt /w y u -v h 7? < -?-iz&m&)fe 

7*7 4 FRE1CD5' 7^^^ — T , PCR^fTot. 

P C R OKJ&mVl £ 0 . 8% T^'D-XTltskiL. &Sftifc/\*>F£p 
T7Blue (R) ^2 ? — (TaKaRaU) ^ $ u — ~>{?Vfc. n u ~ — 
SrLBigtUiTldfc^^tg^L. 7WJ - SD SST77^5 h* £ ffl tb L , ft] @ 
l#S^a(iJ:0-f >+r- h*<A^T^5;i£a<fll&£ft;fc7-oa>^D — ><D*gS 

22 



WO 99/48356 PCT/JP99/01481 

ftimzyfi^X I- DN U * 7— if (Bca BEST DNA polymerase) HTftH - 

Lit. ( rn<*-H5ft <77 K •S£i l «¥«f(OS«j ) (^j0%t) . 

5* - G A C T C G A G T C G A C A T C G - 3 ' 
F R E 1 <D 5 ' "7 — 

5' — ACACTTATTAGCACTTCATGTATT — 3 

(1) 9 5t 5 ft 

(2) 9 5t 4 0# 
( 3 ) 5 5 °C 3 0 fj> 
( 4 ) 7 2 "C 1 ft 

(5) 7 2t 1 0 ft 

(6) 4t 

-C » (2). ( 3 ) , & . <4)£4 0le]fli9SLfc. 

ZKD*§J£. FRE UiALitllfEi^A'DT-tt, FRE lief^bli? 
$ tl m R N A IZ it . ^ 3 lH tl ^ "t <t -5 U itt ft T # U ( A ) # »D L T ^ . 

# U ( A ) <7> ft JP fi S fi — *I T tt & < , lKO^Ofi3CmRNA^i?ttLfe. 
# U (A) it -f h (poiy(A) site) <£> ± Sfe 03 /J? U (A) -> 9 -t )V (poly(A) sign 

ai) t. Ligatsnt^s tJBtonsiE?'j*«ii£$ns* u (a) ->^jp ( 

putative poiy(A) signal) t LTSLt. 

SIJE0IJ4 (P C R ffi tc «fc * €■ -fe ^ ^ > K<7)Mjit) 

S- -fe ^ * > h tt , 3! 5 HI fC ^ £ n £ , PCRffiCiOSiLfc. ^«y^#'J^ 
7 — if (Taq polymerase) X — — $ y 9 (super Taq) ( +r "7 ^ -Y — ) £ fsH 

P C R K ifg <7D M fi£ » *Ot*«3T*S. 
<£ 1 ©PS© P C R KJfcilfc 
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1 0 x 11 fBr w 




1 


0 


tt 


L 


2 m M d NT Pig p 




1 


0 


tt 


L 


2 0 uM /7-f7- (-3) 






o 


tt 


L 


20(iM 77<7- (-4) 






5 


u 


L 


^StK T ±S£ 9 


9 




5 


u 


L 


pg<7) P C RRfBI 












m 1 ©P9<& P C RKfSm&Va 






1 


u 


L 


1 0 x II ® fl£ 




1 


0 


tt 


L 


2mM dNTPS&tt 




1 


0 


tt 


L 


2 0 fiM y 7 ^7 — ( - 2 ) 






5 


tt 


L 


2 0 uM -7" ^ -f ^ — (-5) 






5 


tt 


L 


T? 9 


9 




5 


tt 


L 


P&OPCRJEI&SK 












1g 2 P C RSlBift □! 






1 


tt 


L 


i o x m m t& 




1 


0 


tt 


L 


2 m M dNTPg^tt 




1 


0 


tt 


L 


20/iM ( - 1 ) 






5 


tt 


L 


20/iM ^7 < x'— ( - 6 ) 






5 


tt 


L 


3515®* T £S£ 9 


9 




5 


tt 


L 



( 1 ) 9 5 °C 55} 

( 2 ) ^7^0. 5 a Lmm 

( 3 ) 9 5 *C 4 0f) 

( 4 ) 45t 

( 5 ) 7 2 °C 

( 6 ) 9 4t 

( 7 ) 6 ox: 

( 8 ) 7 2 "C 



1 # 
1 # 
4 0 8> 
3 0 g> 
1 5* 
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( 9 ) 7 2 °C 1 0 # 
(10) 4 °C 

. (3). ( 4 ) . & , ( 5 ) * 5 11 , (6). ( 7 ) , R * ( 8 ) * 
2 O0iOlLfc. 

nfigtfij 5 p — - > tf £:i&&m&\(Dmm) 
mmm 4 cam 3 mum co p c r (DEimm^ . 0. 8 %y # o-xy;n!tMi 

®J L 7t & . fgSn5S3 ( 4 1 7 ~ 4 3 6 b p ) W A' > K £ 0 tti L T fit S!£ L . 
7"^X=KpT7Blue (R) ^^7^— ( T a K a R a §2 ) ^^D-->^L 
Jt . ?#«=>ftfc^Ci — >Q&glE5>J£:flfe&L. SHIMADZU & ft DNAs 
e q ue n c e r DSQ - 1 0 0 0 L £ *ij Jl L T IE L 1^ g IE ?U £ t> CD £ 

-tn-t'n^iE L ^IS^iJ^ fe-Q-fe y ^ > h fi?JIE#3!§SI5&«:fi|ffl LTI 

8HI(Dcte>{Cr e f r e l<&£fi£(£l£L;fc. -tr^'^>hB. E . f > +>- — h 
0(Rj#*<±S<Z)f^fiJcfcJi>*w^STfer3fc. fill CD -fe ^* ;< > hCO^TfJ-f > ii- — h 
CD Jb] # |?fH£ & < IE L ^ 2£ @E £: "a -5 fcco^r^Effl L . 

tfc r e f r e 1 <D ± & S @E ?'J £ , IE <D SE ?J # ^ 1 RZSWi 9 U d^T. 
SUSS^I 6 (r e f r e 1 O^An^OlA) 

5 O^lSLtie^ r e f r e li. ^ A* n (Nicotiana Tab ac urn 
L. var. SRI) ^iALfc. M M & & <D & ^ . * ^ ^ < -> > t; jfifttt & « ft) # 6 8 fll 
S±UTf fcftiftllSWie^-Cife* r e f re 1 A £ nx lr> 

^ <& zi e - & £ a ist z> tz #> n y j = -y ^ ■ it-tf > • /\ -r yj y -r -tr 

— -> a > * ?t -3 . fOfiS- 1 3 t: — ^6^3 t° — CD r e f r e 1 it £ ^ CD # 

( i ) m n & & m a < ^ u — ^ z ^ - p r f i cd 

pT7Blue (R) '(^-C^ D-z>{i'Lit r e f relCXba It 
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Sac I 77^/>h$. TOYOBO* 
p B I 1 2 1 (D 13 - f'JU $ a ~ $ ' — if ( 0 
F tic&LT . / W "J — ^^7^ — pRF 
RF KDjgl^^l 1 iCSt. 



PCT/JP99/0148I 

— glucronidase) ©OR 
1 S^^lfc. A -f ± U - ^ <7 $ — p 



(2) (;Ht'J-^^-pRFl«7^DA^f'J^A (Agrobacter ium) 

£^tr L B^&igilk 1 m L ^T. 7 >f u 9 W 1±V * 7 t 
•>X>7>C 5 8 (Agrobacterium tumefaciens C5 8) £:2 6 < CT-2Bjfeig||lg^, 
p R F 1 £ t> O^H® t — X 5 K (helper pi asm id) p R K 2 0 1 3 

z ti^xmmz 3 7 "c t i & js « l & , ^nfhi o o v l & t o 

U- h (100«g/(iLU77>ey> (rifampici 

n) (Rf) i2 5«g//xL*t?-f ■>> (kanaraycin) (Km) ^^ttL B 7" 
U — h ) tc, •> > ¥ )ln a ~- &Bf%£ Itfz (2 6tT2t«i) . 

f#bnfc->>^'Jl/3 0- -$4mL©LB (Km, R f ) « fl£ Jg Mi ■*« . 2 6t 
T 2 njfeig^Jgft > 7 )Vts U > - S D S 7 X ^ H £i*aj L . fffi] flg B# S f J; 

S9J»fAS*- >£^& d p R F 1 (Dftte&mm Lit. 

(3) ( 7 £7* □ A £7 -r- 'J r> A (Agrobac ler iura) <£> ^ A zi 0) jg£ %k t ffi Pf (D M 

IF £ S CO 8 c m (5 H \9 A zj ( N i co t i ana Tabacura L. var. SRI) <D^^m$:2^ 
3 0 m 0 . ili (^ffiifiSIBSlOSB, T w e ■ e n 2 0 0. 1%) T iffi & 

Lfc -> ^ - u ic xn, 1 5 Luifi zmm l jt. t <Dfifii*t3ii$i- 

^fc'fg. S$;8mmft!:^XT^ofc, -> •*> — U l" M «£> T *3 tz M Jt . LB 

(Km, R f ) Sfrgft'C, 2 etT-rt^iLt/Htij-Aj^^-pRp : 
^tOT^n/^T'J^A (Agrobacterium) <D lg ft fl£ £: 3 m L £ JO A ;t » 

1 # f£ . /U 7 - J 7 h T t (i ^ < 1 $ ii 0 f , $ e (3 =*• - h v v — ~? 

mm Ufcffi«£±T3fe» TiJK^lR 0 Kt^fc. 1H"^TE(DM S i£tl!i (II) ±tcM#. 
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2 5 1 Rji 3k # t- 3 a Ltco ^(Dmm^^usmm. (hdc^ l i mm® 

mLfzm. MSigtlii (IV) (c |L 21 FbI iCiAi^t". # JU* #i9iig £ *l 
•> a. — h (shoot) M fig $ ft e> ^ 7> T? 1 -> a. — I- (shoot) £ « 0 
0 M S Jg ( V ) iZ&Lfc, is i - h (shoot) ^ b « # fcB 7c fe CD & A - ^ 
4^ =l 7 -f h W. X X . /\-f #?7n (A-f#?7^^ v A > ) £ ^ X T 



(Major elements) (g/L) 



N H « N O a 


1 . 


6 5 


K N O , 


1 . 


9 


C a C 1 2 • 2H-0 


0 . 


4 4 


M g S O .i • 7 H * O 


0 . 


3 7 


K H „ P O 4 


0 . 


1 7 


'>iWf)l^ (Minor element) 


(rag/L) 




H B O , 6 . 2 






M n S O , • 4 H 2 O 


2 2 . 


3 


Z n S O a • 7 HzO 


8 . 


6 


K I 


0 . 


8 3 


N a 2 M 0 O 4 • 2H z O 


0 . 


2 5 


C u S O 4 • 5 H 2 O 


0 . 


0 2 5 


C 0 C la- 6H z O 


0 . 


0 2 5 


Fe (lII)Na — EDTA 


0 . 


0 4 2 m g / L 


^t'f / •> 1 — ;i/ 


10 0 


m g / L 




5 


m g / L 


sue rose 


3 0 


g/L 


y 7 > # A 


2 


g/L 
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in £. % n £ m X T T I E <D <D M S Jg iffi £ fl? o . 

fit ft * ^ > ^/y'il'Tf-y ( B A ) 1 . 0 m g / L 

7 ^ U > @£ l£ ( N A A ) 0. 1 m g / L 

K K # ■}* ^ -l" -> > (kanaraycin) 100 m g / L 

2 ? 7 * ? > (claforan) 2 0 0 m g / L 

M S m tlk (II) M S m tl!i ( I ) + B A + N A A 
MSHfttH (III) M S J§ At (I) + BA + NAA + ^7^y^-^> 
MStgl (IV) M S m tit ( I ) + BA + NAA 

+ ^77* 7> + ^f7-i' ;> > 
MSigtfc ( V ) M S m tit ( I ) y> 



( i ) ^A'D^b©yy ADNA«atn 

^A'n^bwyy ADNAtDMttiii r 4g 4fe ^i us x 3* -> u - x • mm (D?cRm m 7 

nhn-Jh ( 51 f4 tt ) l:f ofc. 

?1&MC V 0. 1~0. 2gOlSXn. S»S35SllDAT^il:tOO^Lfc. 
£ X -y ^ > h'^7f a-yi:An, 3 0 0 # L W 2 % C T A B & ^ £ Jp A . 
?g £ L . 6 5t;i:3 0^FltP?SLt„ f icD^aa^^A • ^ V7 5^7Jl.n- 
;i/ ( 2 4 : 1 ) £ JO A 5 # m £ L . 

1 2 0 0 0 r p mT* 1 5 5J>lffljS'D L. ±® £ §r L l^i — L , ? a a * 

• -fV7*;U7;Un — JUttm£*>5— &»9igL. ± H £ $f L 5P - 7" (I 
L jfc . 1 ~ 1 . 5^1© 1 %CTABiffi£jDA. Zg^L. g i& T 1 H# RSJfl* fi 
L £: . 8 0 0 0 rprntlO^HMLfc. ll^iT 4 0 0 /i LO 1 MC s 
C 1 £ AP X . 6 5 <C T" it fig; # 5c £ \Z m ft Z> * T JO @ L it o 8 0 0 «L©100% 
X^y-;p*JnA, ffi^L. -2 0tt2 0^FISlf 1 12000 r p m T 5 
±m*J#T. 7 0 % X^ y -;UT'#c-#t£, MEE$Li$k L fc h <D & 3 
0 M L © T ElIiKC/g^ Lfc. 



& & 2% CTABS1 
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lOOmM h'J7s-HCl ( p H 8.0) 

20mM EDTA (pH 8.0) 

1 . 4 M N a C 1 

2% CTAB ( ce t y 1 t r i me t hy 1 amraon i um bromide) 



i % ctab mm 

50mM hUX-HC 1 (pH 8.0) 

20mM EDTA (pH 8.0) 

1 % CTAB 



( 2 ) m mm m iz «t 3 y / a d n a <d w m t m^fcm 

ffr] flg i£ 31 ® II , pCaMV3 5Si>b t NO S SfA««lli$n5 E c oR I 

R^H i nd I I I (DM^ IZ £ Z> Mlt £ . p C aMV 3 5 SO±iT"«i$ft^ 
Hi n d I I I 7c C"t ~C <D ffi it&fi-ofzo 

y/ADNAl 0«g*. SfBll lOOUt l-t mm.Mm&km L . 

X ^ / — JU «t fS U T 2 0 u L <D T E mffiWlzmm L fz „ 1 od i n g bu f f 
er 2 aL^JDATO. 8 % 7 # n — xyjH:, 6 0 t^*iL 

7c o lb 7 & , y^^lf A^P"?^ h't^fel, UVh7>7<^U 
- * - ± T X ^ - ;u t £ t> (C ^ M £ fflf? L 7c . 

( 3 ) 7 a y t- 4 y 7 £ a < 7 U 7 -f M — -> 3 > 

^H«f**©y;n£aSsfS*Tifc#L, o. 2N h c i tfe 1 o l fc. 

fee 0. 4N NaOHTt-f □>/>yi'> (New Hybond-N+ , 
Amersham) l:h7>X77-L, 2 x S S P ET 5 ^fto/cttiS-e 3 
H# W ft SI L . /W 7' U y-T -ri- -> 3 XDjjfci* fAW^^I-f 57 h H 77 

K-iae^/Bwosssj ( ^ m tt ) stiifc. ^ > 7" u > * & <=> a* 1; * 6 5 *c 

&C Sn @ b T *3 ^ 7c 3 0 m L <7) A -f "7' U ^' -f -fcf — -> 3 >A'7 7 7-t7l/A-f 7'J 
y -f if — -> 3 > * 6 5 °C T- 1 m !S\ Vf ^ . / W 7' U 3?" -f if — ;> 3 > A* 7 7 — £ 3c 
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& (2 5mL) LTtc "7° O — y £ iP A . 6 5 "C T 1 2 B# Tel A -f y*'J if — -> h 

>^ffot. ^ > y u ><7D #t ^ « & e» a* i; j6 6 5 t; ic an si l t ^ fc «t v$ t 6 

5"C10#3:2IIK h i gh s t r ingen t , 6 5t 1 0^^ 1 

IsHt o 7t . ^ > ^ U > €: -y- 7 > 7 y ^"T "3 "O , -f * — y>^7> - h I: 2 4 

-f>->?T^-7-f1f- (F u j i F i 1 m») £ffifg L fc. 

ait 3j| 

2 0 x S S P E 

3 M N a C 1 

0. 2M N a H 2 P O ^ 

1 mM E D T A 

1 M f t-f ■ U>iA'7 7 7- 

8 0 0 ml m&<DM&& \Z. N a H P O 4 0. 5 m o 1 £ 
1]Q X . HiPOiTpH? 7. 2 {3 *0 -tir & . @ zK T 1 L 

a 7*u y-f -a — -> 3 > a -j-y t — 

0. 5M f t-f • U>lEA7 7 7- 

1 mM E D T A 

7% SDS ( v / v ) 

fsSfflmltC denatured salmon sperm (lmg/mL) £• 
1/lOOvo 1. 10A5 

40mM ft-f ■ 'J>|;\'-;7 7- 

1 % SDS ( v / v ) 

high stringent #tv$?i£ 
0 . 2 x S S P E 

0.1% SDS (v/v) 



( 4 ) 7 n — 7 O) ft & 
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r e f r e 1 (D ft & MM L I- T , 7>yi7'7-f7-DNA7^'J >y 

+ y hlg 2 IK (Random Primer DNA Label ing Ki l Vor. 2. 0) (T a K a R a 
£ ffl ^ T f □ - 7' £ f£ i« ( 7c 7c L ta- A2 P] - dATP^ffll^;. ) L . P r 
obeQuantTM G-50 Micro Columns (Pharm 
ac i a B i o techM) \Z <£: 0 3i K (D [ « - •' 2 P ] -d ATP^PI^l 

fZo 

mgk&m I 4iCSt. If! 1 4 El (D 7i <P.« . ffl m W m E c oR I & H i n d 
I I I <D ffi # \Z ct 4 J8 it £ (t o 7c t> 0) T & 0 . £ fflfl . Hindi I I 7c l"t T' <7) 
?8 ft £ It o 7c CD T & <5 , El <t> CD W . T . ill £ M £ ^ f o 

31 JEW 8 ( y — if 

(1) fRNAOtttU 

HJS^J 1 tlp|«lfe^fefZ«fc 0 . r e f r e 1 4iie?iAl/fc)gRea^A3 

(2) RNACDmM£l<!!j 

Ki W • y^Stt • =i — A 43 ck Zf H ft "7 ^ X n * £ ^ £ 77"V^7" (a 
bSolve(RNase Pfi § Sd DU PONTK) ) jffllSLT^^fc. 
20xMOPS10mL, 2. 4g7*D-7, 1 0 0 m L fiiIS** = fi 7 

7^3i:^n, tf u > v t r # □ — x £ jg L 7c . fisotSTftxfct:^ 

T . *^i7Jl/ft F 1 OmLJiPi. 2 0 0 mL C Lfc fcO^i 

feT7-';K:ffl^fc. 7* fb W IZ. tft 8 0 0 m L CD 1 x M O P S £ A tl . 1 0 m g / 
mL0lf->^A7'DT^h'5/iL^j»i 1 ScWjffifflfiKtLfc. £ R N A 1 0 u 
gl: 1 6/iL<DRNAs amp 1 e bu f fe r & HQ X . 0 m 

LtL. 6 StTl O7}FlllD?SL/:i)kJ:i:'5 / if 0 1#Ilfc ! b©^^fjL/c. 

6 ovx'iPtri^iifci. ies i 2 o vi;iT$6i;2iitr H mi 

flit HI 



31 



WO 99/48356 



PCT/JP99/01481 



2 0 x M O P S 

0 . 4 M 
0 . 1 M 
0 . 0 2 M 

RNA sample 



MOPS 

N a O A c 
E D T A 

b u f f e r 

1 . 6 

5 . 0 



2 0 x M O P S 0.5 
9 U -\z U > fe m m l . 6 



m L 
m L 
m L 
m L 



a. 
□ 



It 



8.7 m L 



^Ut'J > fe ^ % 

•7" □ ^ -7 x y — ;u -7" ;u — 

0. 5MEDTA (pH8. 



0 ) 



5 m L 

1 m g 

1 m g 

0 . 0 2 m L 



( 3 ) y' a v t- 4 > 7 t A -f 7" U ¥ -f if — -> 3 > 

*Si*»7&. uv < * — ^ — iccDii-7. f r — )i tpuzw-n&m-o 

fz, 7'D77-f /^rotlili r ^7 □ - - > ^ t -> - $ X > X J ( J8 tt £ ft *t ) 
fi£ (^ , 2 OxSSPET, RNA^^r;U^e>^-l'n>^>^*l/> (New Hy 
bond-N Amersham)HH^>X-77 — L7Co 1 2 P# IB & , * > 7 
V > £ . 2xSSPET5»ISft^. 3^K^aTffifilLfcS&5»raUVHg|tL 
T/>^l/>i:RNASi)ILfc. 

A< ^ U y< -tf- -> 3 > W^ffittU-tf >fl?#TC7)il£- <h[Sl«tCfTo fc. 

IS 151 £ f§ 1 5@l:^f. 1^©W. T. [J^ti^if, ± (W. T. ) 
tf> U- >£ttA*> Ktt^tb^E ftftfrofc. No. 1 <h N o . 2CDl/->tC«2. 
5 k b (D ± # $ tc * ->* — fc A' > K #tfc fcti £ n . tl «fc 0 /h $ ft <i E (z < -D & 
<D A > K # & tti 2 n it . 
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mmm9 ( = mmm.7tmm^^(Dmm) 

r e f r e 1 AL fcBn$£&fo tW±m(D 5> & , ;i— 5 ^ a. ^ 

< M~ l§ ;t # x. A -f tf; ^ -y v* 7x (Hyponex) *4^TfT, is — h (s 
hoo t) ;&< 5 c m~ 1 Ocmte{Cfc^fct><0£ffi^T = to»jE7E»iti?g14£?fl| 

OO I-' />_ o 

Hffi«il7cBS3f5ei4<Dffll|g^«. Fe (II) 7 
:c^->VD'J>> ? yWUtf;>& (ba thophenan thro 1 i ne d i 
sulfonic acid (BPDS) ) ^\ F e ( 1 1 ) <h it £ A3 
t5 C tTgfi^lt^ Z £&mm Ufc. fttiimmWm. (assay buff 
er) !; 7 o - X S 0 . 4 % tC & £ =fc -5 (C JtJP A , U > y T/§I^, 

SSATii> 6 1/10 0 vol. C0500/xM Fe (I I I) - E DTAi 
1/1 00vol. O 5 0 0 fiMBPDSSjD^, L ^ §§ {C A tl , IS £ -5 £ 

T^fofc. ^Ifei^tf tiro^/tnro;\'-5+a7^ h £ ISfc H it & . *g * y 
)V\Z m it 7 )V ^ tfW )WMit LT 2 7 °CT 2 4 mm&W Lfc. 

r <d m. m £ pi £ *i m <d m =? & % w £ tz 2 m r b <r> b r $e &4g 4^ Tfrofc. 

CL<7)<h#<7)£JSP*|Hn±lll#lffli:Lfc. 

# Ht fl! *g 8f (as say bu f fer) 
0.2mM CaSC 

5.0mM MES buffer pH5.5 

^m&RKMmfefccDmmn'^m&mi 6 m^zsm, 171c 2t^@«Hffi 
<=> = m m <d ii ft a< m is $ n fc . 
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.!ff sic co m. iza 

A CO ^ U ( A ) M*DtCB9fi%-rsSiR©i^«*. mRNACDf'J ( A ) # in (C PI £fc 

2 . ^A^nsfliiroaoDjae^^. @# § & & © t> © t $> z> is * co m. m w, i im ic he 

3 . m R N A (D # U ( A ) # JjP IZ H5 ^ "T -5 g B CO M 1ft # . AATAAA«CtS 

ie -c & § t r * co m m m 1 x « m 2 jik 12 $1 co ?i & „ 
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1 ATGGTTAGAACCCGTGTATTATTCTGGTTATTTATATC 1 1 1 1 1 1 1 GCTACGGTTCAATCG 60 

61 AGTGCTACACTTATTAGCACTTCATGTATTTCCCAAGCTGCGCTATACCAATTTGGATGT 120 

121 TCTAGTAMTCTAAMGTTGCTACTGTAAAAACATGAATTGGCTGGGTTCAGTGACAGCA 180 

181 TGTGCCTATGAGAATTCCAAATCTAACAAAACACTAGACAGCGCGTTAATGAAGTTAGCA 240 

241 TCCCMTGTTCMGCATCAMGTTTATACTTTAGAGGACATGMGMTATTTATTTAAAT 300 

301 GCGTCAAATTATTTGAGAGCACCTGAGAAAAGTGATAAAAAAACCGTGGTTAGTCAAGCG 360 

361 CTCATGGCGAACGAGAC AGCGTATCATTATTATTATGAGGAAAATTATGGTATCCATCTT 420 

421 ANUCXMttGClGlCJCJaJ ^^ 480 

481 ACTGCAGCCACTATCTTGAACATTCTGAAA AG5G I ti bbl Gti I l AAGAACATC ATGGCAAAC 540 

541 TCCGTCAAAAMTCACTTATTTATCCTTCTGTTTACAMGATTATMTGMCGMCTTTT 600 

601 TATTTATGGMGCGTCTACCATTTMTTTTACAACTCGAGGCAAGGGTCTCGTCGTATTA 660 

661 A ll lil l U IrlAlvffETQA CTATATTATCT^ 720 

721 CCATATGATAGGCCCAGATGGAGAAGAAGTATGGCCTTTGTGAGTCGTAGAGCAGACTTG 780 

781 ATGGCCATTGCACTTTTCCCAGTAGTCTATCTATTCGGAATAAGAAATAATCCCTTCATC 840 

841 CCTATMCAGGGCTnCCTTTTCTACATTTMTTTCTATCATAAATGGTCTGCCtACGTT 900 

901 TGTTTCA TGTT6GCCGTTGTACACTCAATT GTCATGACCGCCT CGGGAGTGAAAAGAE1! 960 

961 &I.GM ;l . I iC AMGTCTGGTTAGGAMTTTTAC ir^l-AG^i I (jG^ II ATAGTGGCAACGATATTA 1020 

1021 ATGTCTATTATTATTTTCCAMGTGAAAMGTATTTAGAAATAGAGGGTATGAGATATTC 1080 

1081 CTTC TTATTCATAAAGCGATGAATATTAT GTTCATTATTGCCATGTACTACCATTGTCAC 1140 

1141 ACCClTGG^TGtiAltm 1200 

1201 TGCAGGATTGTTAGAATAATCATGAATGGTGGCTTGAAAACTGCTACTTTGAGTACCACT 1260 

1261 GATGATTCTMTGTTATTAAMTTTCAGTAAAAAMCCAMGTTTTTCAAGTACCAAGTA 1320 

1321 GGAGCmCGCATACATGTATTTCTTATCACCAAAAAGTGCATGGTTCTATAGTTTCCAA 1380 

1381 TCACATCCATTTACAGTATTATCGGAACGACACCGTGATCCAAACAATCCAGATCAATTG 1440 

1441 ACGATGTACGTAMGGCAMTAMGGTATCACTCGAGTTTTGTTATCGAAAGTTCTAAGT 1500 

1501 GCTCCAMTCATACTGTTGATTGTAAAATATTCCTTGAAGGCCC ATATGGTGTAACGGT T, 1560 

1561 CCACATATCGCTAAGCTAAAAAGAAATCTGGTAGGTGTAGCCGCl!^Sl!lE^I^S3CG 1620 

1621 GCTATTTATCCGGACnTTGTCGAATGTTTACGGTTACCATCTACTGATCAACTTCAGCAT 1680 

1681 AMTTTTACTGGATTGTTMTGACCTATCCCATTTGAMTGGTTTGAAAATGAATTGCAA 1740 

1741 TGGTTAAAGGAGAAAAGTTGTGAAGTCTCAGTCATATATACTGGTTCCAGTGTTGAGGAC 1800 

1801 ACAMTTCAGATGAGAGTACAAMGGTTTTGATGATAAAGAAGAAAGCGAAATCACTGTT 1860 

1861 GAATGTCTCAATAAAAGACCTGATTTGAAAGAACTAGTGCGCTCGGAAATAAAACTCTCA 1920 

1921 GMCTAGAGMTMTMTATTACCTTTTATTCCTGCGGGCCAGCAACGTTTAACGACGAT 1980 

1981 TTTAGAAATGCAGTGGTCCAAGGTATAGACTCTTCCTTGAAGATTGACGTTGAACTAGAA 2040 

2041 GAAGAMGTnTACATGGT 2059 
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5' 3T 

A-1 G^TTUTCTAC^CTXACCA713an* 83mar 

A-2 CTCCMTOWmTACAC^ 83trer 

A-3 CAAGCAAGTCAAAGICTnXXACn^^ 83mer 

A-4 TraCT^AAAOX^ 83mar 

A-5 TOTTPCTTATCGW 83mer 

A-6 (^TCCCATACTTnunrATA^ 83ner 

&-1 (^TOACTP^Tn^^ SOtier 

B-2 CCTTACCGCCGCMCTATUrr^ 80rer 

B-3 (OTWGAAGrtUCm^ 80tner 

B-4 AGAGTCAGAGAAT;^^ 80rer 

B-5 MTOOCATTGATCTTCTCCATCT^ SCrter 

B-6 ICCGGATOGTC^^ 80rt©r 

0-1 TOBGWCAAaXC^ 83mar 

C-2 GCATACGrtUTCCTraTUn^ 83ner 

C-3 TATTTXWaUCTTCT^ 83ner 

0-A ATAAACATCATCTTOVIGGt:! 1 1 U I UAATAAGTAAGAAGA7TTCATAAGC I ClEnCCTCAAGALXJ I ICIUiaACTQGAAAAT 83ner 

C-5 GAQGATQDC^GCCATQG^CCAGA 83i»r 

C-6 GTO^CAAAGTOGCGGJm 83tner 

D-1 CTUMWCAGATCATTCT^ 82mar 

0-2 QGAGCATnTXCrATATGTACJ 1 1 ICI 1 1 (JACCAAMTCAGCCTGG rTCTACAGTrTTC^ATCTCA'i OCC 1TCACAGTCCTAT 82ner 

0-3 TTCACAGICCXAT^^ 82nar 

EM OCnnAAGAAMTCrnKMTCAAOGGT A lOUi I iGGftGCGCTTAG^ACI I lUUA AGftAGTA UUUjIM TO LIJU I U1 1 82msr 

0-5 QGCCCGCAGCTACTOTACTAGOT 82tnar 

0-6 CTGCAGTTCATCAGIG^^ 82n»r 

E-1 OGCAQCACAAGT^ 77mar 

E-2 ACMTOGCTTMG^C^ 77ner 

E-3 CAAACTUCATG^OCIO^ 77mer 

E-4 GTCAIU1 1G1 1G1 ICIukflTCKaC^ I IG AgnCTOG UJia iUI IA AS 77m8r 

E-5 (WACCrnSTAGAACT^ 77mer 

E-6 AAGCTTGNSCTUHACCAACTAAMCT^^ 77rer 
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1 GAATTCTCT AGACTCCACCATtjGTTAGMCCAGAGTCL: 1 1 \ tUm CCTCTrC A T ClCn I C I IC GCTACAGTCCAATCGAGCGCTACACTCATCTCCACTTCATGCATr^ 120 
1 CTTAAGAGATCTGAGGTGGTACCAATCTTGGTCTCAG 120 



121 ACTGTACCAGTTCGGATGCTCAAGCMGTCAAAGTCrrGCTTACTGCMGM 240 

121 TCACATGCTCAAfVXTACGAfiTTCGTTCAGTTTCAGAA 240 

241 CGCTTTGATGAMCTTKCAGCCMTTXTTCMG^ 360 

241 GCGAAACTACTTTCAACGGTCGtnTACG^GTTC^^ 360 

361 GACA U I t O 1 1 1 C A CMC CGTTGATCGCAAATCAGACGGCCT ATCACT ACTrACTATTy^G GAAAAC TATGGGATCCAGTTC AATT^ I Lb H 1 1 480 

361 CTGTCAACAAAGTGTTGGCAAjCTACCGTTTACTC 480 

B ~ 2 te. 

481 C TT C TG GG ft GCAG I C C 1 1 A CCGCTGCAACTATCTTCAACATTCrCAAACGCGTATTCGGCMGAA 1 I C I G I I MG AAGTCTCTTATCTACCCAAGCGTTTACAAAGA 600 

481 GAAGACCCAGCGTCAGGMTGGCGGCGTTGATAGAACn 600 

601 CT^CAACGAGAGAACTTTCTAT^ 720 

601 GATGTTGCTCTCTTGAAACATAGAAACCTrr^ 720 

721 TAACA7CMGTTGCCACATCCTTACGATAGACCTAG 840 

721 ATTCTAGTTCAACGGTGTAGCMTXTATCTGGATCTACCTCTTCTAGTTACCGTAA^ 840 

- - CLJ 



841 CCGGjj£AACCCCTTCATCCCMTCACCGGATTC 960 

841 GGCCTTGTTGGGGAAGTAGGQTTACTGGCCTAACT 960 

== c ' 2 » ; ; • C-3 

961 TTCAGGAGTTAMCGAGGAGTATTCCAGTCTCTT^ 1080 

961 AAmrj.Tr^TmjTrr.Tr^^^ 1080 

1081 CCGAGGTT ATGAA A I C I IL ' I l A CTTATTC ACAAAG CCATCAACATt^TGmATCATAlgTAT^ ]200 

1081 GGCTCCAATACTTTAGAAGAATGAATAAGTGTTTn%TACTT^ 1200 

C-5 ^U-b ~ 

1201 CCI L I bLU Ltj ACAG b H L I b L tO AATmAOTATCATCATGAACGGAGGTCTAA^ 1 1 1 li I I GACCACAGATGATTCTAAGGTTATCAAGA 1C 1C7GTCAAGAAGCCTAA 1320 

1201 GGAGACGAAGCTGTCCAAGACGGCTTAACATGCATAGTAGTACTTGC^ 1320 

1321 GTTCTTt^GTATCAAGTGGGAGCATTnXCTATATCT 1440 

1321 CAAGAJVGTTCATAGTTCACOrrCGTAAACGGATA 1440 

1441 TMCAACCCAGATCAACTAACTATGT^^ 1560 

1441 ATT^nil ^ TCTAGTT^TTEATACATGCAGTrro 1560 

1561 ACCATATGGCGTAA C 1 G I OX I LACATTGCCAAACTT MG AG AMTCTAGTAGGAGTAGCTGCGGGCCTCGGCGTGGCA 1680 

1561 TGGTATACCGCATTGAOUjGGAGTGTAACGGTTTGAA I IL [LI 1 1 AGATCATCCTCATCGACGCC(%GAGCCGQICC^ • 1680 

U-b 

1681 CACTCATCAACTGCAXACMGTTCTACTGGATC JggO 

1681 GTGACTAGTTG^CGjnrrGTTCAA^ 1800 

1801 TGGGTCATCAGTGGAGGATACAAACTCAGiTC 1920 

1801 Aa^TAGTCACCTCCT A T a m f UGTCTACTCAG^ 1920 

1921 ATt^GAGATCAAATTtnCAGAACTCGAGAACA^ 2040 

1921 TAGTCTCTAGTTTAACAGTCTTTiAGCTCTTGTTGTTUT 2040 



E-6 

2041 GATAGATGTCGAACTAGAGGAGGAGAGTTTT ACTTGGTAAGAGCTCAAGCTT 2092 
2041 CTATCTACAGCnGATtTCCTCCTCTCAAM 2032 
Sad HtndW 
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1 gaattctctagactccacc 19 

20 ATGGTTAGAAC CAGAGTCC 1 1 1 ICJUCCTCTTCATCTCI I ICI I CGCTACAGTCCMTCGACCGCTACACTCATCTCCACTTTUTGCATT 1 09 

I MVRTRVLFCLF I SFFATVGSSATL I STSC I 30 

110 TCTCAGGCTGCACmACCAGTTCGGATGCTCM 199 

31 SQAAL YQFGCSSKSKSCYCKNI NWLGSVTA 60 

200 TGCGCTTATGAGAACTCCAMTCTAACAAGACTCTTX^CTCCGCTTT^ 289 " 

61CAYENSKSNKTLDSALUICLASQCSS I KVYT 90 

290 CreGAGGACATGMGAAC^TCTACCTTMTTjCAAGTAACTACCTT 379 

91LEDMKM I YLNASNYLRAPEKSDKKTV VSGP 120 

380 TTWTO^WTCAGACGGCCTATCACTACTACTATGAGGW 469 

121 LUANE TAYHYYYEENYGI HLNLURSOWCA W 150 

470 GGCCTCGTCTTCTTCTGGGTCGCAGTC 559 

151 GLVFFWVAVLTAAT ILNI LKRVFG KN I U A N 180 

560 TCTGrTTAAGAAGTCTCTTATCTACCWAGCCnTTACAAA 649 

181 SVK KSL I YPSVYKDYNERTFYLW K RLPFNF 210 

650 ACMCTCGAGGCAAAGGACTCGTAGTTOTATCTTTG^ 739 

2HTTRGKGLVVL I F V I LTILSLSFGHNI KLPH 240 

740 CCTTACGATAGACCTAGATGGAGMGATCAATGGCATrCGTCT 1 1 i CCCCGTGGTGTAC 829 

241 PY DRPRWRRSMAFV.SRRAOLMA I ALFPVVY 270 

830 CI 1 1 1 CGGTATCCGGAACAACCCCTTCATCCCMTCACCGGATTGAGCTTTACT 919 

271 LFGIRNNPFI P I TGLSFSTFNFYHKWS.AYV 300 

920 TGmCATGTOGCCGTCGTCCATTCMTCGTTATGACCGCTT^ 1009 

301CFMLAVVHSI VUTASG. VKRGVFQSLVR KFY 330 

1010 TTCAGATGGGGAATAGTAGCCACMrTCTTATGTO ATGAAA7CTTC 1099 

331FRWGIVAT1LMSI I I FOSEKVFRNRGYE IF 360 

1100 TTACTTATTCACAAAXXCATGAACATCATGTTTATC^ 1189 

361 LL I HKAMN I MF I I AUYYHCHTLGWMGW I ffS 390 

1 1 90 ATTXXrTGGCATCCTCTGCTTCGACAGGTTCTT^ 1279 

391 MAG 1LCFDRFCR I VR I I UNGGLKTATLSTT 420 

1280 GATGATTCTAACGTTATCMGATCTCTGTCAAGAAGCCTAAG I ICI I CAAGT ATC AAGTGGGAGCATTTGCCTATATGTAC 1 1 IU 1 1 LA 1369 

421 ODSNVI K I SVKKPKFFKYQVGAF AYUYFLS 450 

1370 CCAAMTCAGCCTCGT7CTACAGTTTTCAATC 1459 

451 PKSAWFYSFQSHPFTVLSERHROPNNPOQL 480 

1460 ACTATGTACGTCAAAGCTMCAAGGGCATTACGAGAGTAC^ 1549 

481TUYVKA.NKG1 TRVLLSKVLSAPNHTVDCK 1 510 

1550 TTCTTAGAGGGACCATATGGCGTAACTGTCCCTCACAT^ 1639 

511 FLEGPYGVTVPH I AKLKRNLVGVAAGLGVA 540 

1640 GCCATCTACCCCCATTTCGTAGMTGCCTTAGATT^ 1729 

541 A I Y PHFVECL RLPSTOQLOHKFYW I V N DLS 570 

1730 CACCTTAAGTGGTTCGAAAACGAGCTACAATGGC TTMGGAGAAATCTTGTGMGTCTCTGTCATCT AC ACTGGGTCATCAGTGGAGGAT 1819 

571 HLKWFENELQWLKEKSCEVSVt YTGSSVED - 600 

1820 ACAMCTCAGATGAGTCCACTMGGGTTTCGATGACMGGMGMTCTGAM 1909 

601 TNSDESTKGFDDKEESEI TVECLNKRPOLK 630 

1910 GAGaAGTGAGATCAGAGATCAMTTGTCAGMCTCCUVGAW 1999 

631 ELVRSE I KLSELENNNITFYSCGPATFNDD 660 

2000 TTTAGGAATGCAGTTGTACAAGCTATCGATTCTAGTCT^^ 2089 

661FRNAVVQGI0SSLK 1 DVELEEESFTW* 687 

2090 ctt 2092 
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ftWcD&f* : tett£^RtetfcT*:#&;&tf ^Wffi&Mtf t-e-roflte^- 
IgS*^ : JA908155 

m m m ^ -.24. 03. 9 9 is te, o o is m is 

tbl B : 1 9 9 9 ^3^240 

fist : 6 6 3 7 -si- 

Mft B : 10^3^240 

IE3*J<7}gc : 3 4 



IE?'J§^ : 1 

SB ?<J <£> S £ : 2092 

imcd^c : mmm 

h JJ5 □ ->*- : iSSH* 

@E ?'J <£> # ft : 

*ft«i£^-rSS# : modified base 
§E?'J : 

GAATTCTCTA GACTCCACCA TGGTTAGAAC CAGAGTCCTT TTCTGCCTCT TCATCTCTTT 60 
CTTCGCTACA GTCCAATCGA GCGCTACACT CATCTCCACT TCATGCATTT CTCAGGCTGC 120 
ACTGTACCAG TTCGGATGCT CAAGCAAGTC AAAGTCTTGC TACTGCAAGA ACATCAATTG 180 
GCTCGGAAGC GTCACTGCAT GCGCTTATGA GAACTCCAAA TCTAACAAGA CTCTGGACTC 240 
CGCTTTGATG AAACTTGCCA GCCAATGCTC AAGTATCAAG GTTTACACAC TGGAGGACAT 300 



1 



WO 99/48356 

GAAGAACATC TACCTTAATG CAAGTAACTA 

GACAGTTGTT TCACAACCGT TGATGGCAAA 
AAACTATGGG ATCCACTTGA ATTTGATGCG 
CTTCTGGGTC GCAGTCCTTA CCGCCGCAAC 
CAAGAACATT ATGGCAAATT CTGTTAAGAA 
CTACAACGAG AGAACTTTCT ATCTTTGGAA 
CAAAGGACTC GTAGTTCTTA TCTTTGTCAT 
TAACATCAAG TTGCCACATC CTTACGATAG 
CTCACGCCGT GCTGACTTGA TGGCAATCGC 
CCGGAACAAC CCCTTCATCC CAATCACCGG 
CAAATGGTCA GCATACGTCT GCTTCATGTT 
TTCAGGAGTT AAACGAGGAG TATTCCAGTC 
AATAGTAGCC ACAATTCTTA TGTCCATCAT 
CCGAGGTTAT GAAATCTTCT TACTTATTCA 
TATGTATTAC CATTGCCACA CACTAGGATG 
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CCTTCGCGCT CCTGAGAAAT CCGATAAGAA 360 ' 

TGAGACGGCC TATCACTACT ACTATGAGGA 420 

ATCTCAATGG TGCGCATGGG GCCTCGTCTT 480 

TATCTTGAAC ATTCTCAAAC GCGTATTCGG 540 

GTCTCTTATC TACCCAAGCG TTTACAAAGA 600 

ACGTTTGCCA TTCAACTTTA CAACTCGAGG 660 

TCTGACTATT CTCTCACTCT CTTTCGGACA 720 

ACCTAGATGG AGAAGATCAA TGGCATTCGT 780 

TCTTTTCCCC GTGGTGTACC TTTTCGGTAT 840 

ATTGAGCTTT AGTACTTTCA ACTTTTACCA 900 

AGCCGTCGTC CATTCAATCG TTATGACCGC 960 

TCTTGTAAGG AAATTCTACT TCAGATGGGG 1020 

CATTTTCCAG TCCGAGAAGG TCTTCAGGAA 1080 

CAAAGCCATG AACATCATGT TTATCATAGC 1140 

GATGGGCTGG ATCTGGTCCA TGGCTGGCAT 1200 
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CCTCTGCTTC GACAGGTTCT GCCGAATTGT ACGTATCATC ATGAACGGAG GTCTTAAGAC 1260 

CGCCACTTTG TCGACCACAG ATGATTCTAA CGTTATCAAG ATCTCTGTCA AGAAGCCTAA 1320 
GTTCTTCAAG TATCAAGTGG GAGCATTTGC CTATATGTAC TTTCTTTCAC CAAAATCAGC 1380 
CTGGTTCTAC AGTTTTCAAT CTCATCCCTT CACAGTCCTA TCAGAAAGGC ACAGAGATCC 1440 
TAACAACCCA GATCAACTAA CTATGTACGT CAAAGCTAAC AAGGGCATTA CGAGAGTACT 1500 
TCTTAGCAAA GTTCTAAGCG CTCCAAACCA TACCGTTGAT TGCAAGATTT TCTTAGAGGG 1560 
ACCATATGGC GTAACTGTCC CTCACATTGC CAAACTTAAG AGAAATCTAG TAGGAGTAGC 1620 
TGCGGGCCTC GGCGTGGCAG CCATCTACCC CCATTTCGTA GAATGCCTTA GATTGCCTAG 1680 
CACTGATCAA CTGCAGCACA AGTTCTACTG GATCGTCAAC GACCTTAGTC ACCTTAAGTG 1740 
GTTCGAAAAC GAGCTACAAT GGCTTAAGGA GAAATCTTGT GAAGTCTCTG TCATCTACAC 1800 
TGGGTCATCA GTGGAGGATA CAAACTCAGA TGAGTCCACT AAGGGTTTCG ATGACAAGGA 1860 
AGAATCTGAA ATCACCGTAG AATGCCTTAA CAAGAGGCCA GACCTCAAAG AGCTAGTGAG 1920 
ATCAGAGATC AAATTGTCAG AACTCGAGAA CAACAACATC ACTTTCTACT CATGCGGACC 1980 
AGCGACTTTC AATGACGACT TTAGGAATGC AGTTGTACAA GGTATCGATT CTAGTCTGAA 2040 
GATAGATGTC GAACTAGAGG AGGAGAGTTT TACTTGGTAA GAGCTCAAGC TT 2092 
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mmgr 1 ^ : 2 

§E?iJ<Dfi£ : 6 8 7 
E m <D m : 75 / H 
h # D — : it gfi 4£ 

£ & £ -.mm 

@E?'J : 

Met Val Arg Thr Arg Va 1 Leu Phe Cys Leu Phe He Ser Phe Phe 15 

Ala Thr Val Gin Ser Ser Ala Thr Leu He Ser Thr Ser Cys He 30 

Ser Gin Ala Ala Leu Tyr Gin Phe Gly Cys Ser Ser Lys Ser Lys 45 

Ser Cys Tyr Cys Lys Asn He Asn Trp Leu Gly Ser Val Thr Ala 60 

Cys Ala Tyr Glu Asn Ser Lys Ser Asn Lys Thr Leu Asp Ser Ala 75 

Leu Met Lys Leu Ala Ser Gin Cys Ser Ser lie Lys Val Tyr Thr 90 

Leu Glu Asp Mel Lys Asn He Tyr Leu Asn Ala Ser Asn Tyr Leu 105 

Arg Ala Pro Glu Lys Ser Asp Lys Lys Thr Val Val Ser Gin Pro 120 

Leu Met Ala Asn Glu Thr Ala Tyr His Tyr Tyr Tyr Glu Glu Asn 135 

Tyr Gly He His Leu Asn Leu Met Arg Ser Gin Trp Cys Ala Trp 150 

Gly Leu Val Phe Phe Trp Val Ala Val Leu Thr Ala Ala Thr He 165 
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Leu Asn lie Leu Lys Arg Val Phe Gly 
Ser Val Lys Lys Ser Leu lie Tyr Pro 
Asn Glu Arg Thr Phe Tyr Leu Trp Lys 
Thr Thr Arg Gly Lys Gly Leu Val Val 
Thr lie Leu Ser Leu Ser Phe Gly His 
Pro Tyr Asp Arg Pro Arg Trp Arg Arg 
Arg Arg Ala Asp Leu Met Ala lie Ala 
Leu Phe Gly lie Arg Asn Asn Pro Phe 
Ser Phe Ser Thr Phe Asn Phe Tyr His 
Cys Phe Mel Leu Ala Val Val His Ser 
Gly Val Lys Arg Gly Val Phe Gin Ser 
Phe Arg Trp Gly He Val Ala Thr He 
Phe Gin Ser Glu Lys Val Phe Arg Asn 
Leu Leu He His Lys Ala Met Asn He 
Tyr Tyr His Cys His Thr Leu Gly Trp 
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Lys Asn He Met Ala Asn 180 

Ser Val Tyr Lys Asp Tyr 195 

Arg Leu Pro Phe Asn Phe 210 

Leu 1 le Phe Val He Leu 225 

Asn He Lys Leu Pro His 240 

Ser Met Ala Phe Val Ser 255 

Leu Phe Pro Val Val Tyr 270 

He Pro I le Thr Gly Leu 285 

Lys Trp Ser Ala Tyr Val 300 

I le Val Met Thr Ala Ser 315 

Leu Val Arg Lys Phe Tyr 330 

Leu Me t Ser I 1 e I 1 e I 1 e 345 

Arg Gly Tyr Glu I le Phe 360 

Met Phe I le I le Ala Met 375 

Met Gly Trp He Trp Ser 390 
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Mel Ala Gly lie Leu Cys Phe Asp Arg 
He He Met Asn Gly Gly Leu Lys Thr 
Asp Asp Ser Asn Val lie Lys He Ser 
Phe Lys Tyr Gin Val Gly Ala Phe Ala 
Pro Lys Ser Ala Trp Phe Tyr Ser Phe 
Val Leu Ser Glu Arg His Arg Asp Pro 
Thr Met Tyr Val Lys Ala Asn Lys Gly 
Ser Lys Val Leu Ser Ala Pro Asn His 
Phe Leu Glu Gly Pro Tyr Gly Val Thr 
Leu Lys Arg Asn Leu Val Gly Val Ala 
Ala He Tyr Pro His Phe Val Glu Cys 
Asp Gin Leu Gin His Lys Phe Tyr Trp 
His Leu Lys Trp Phe Glu Asn Glu Leu 
Ser Cys Glu Val Ser Val He Tyr Thr 
Thr Asn Ser Asp Glu Ser Thr Lys Gly 



PCT/JP99/01481 

Phe Cys Arg He Val Arg 405 

Ala Thr Leu Ser Thr Thr 420 

Val Lys Lys Pro Lys Phe 435 

Tyr Met Tyr Phe Leu Ser 450 

Gin Ser His Pro Phe Thr 465 

Asn Asn Pro Asp Gin Leu 480 

He Thr Arg Val Leu Leu 495 

Thr Val Asp Cys Lys lie 510 

Val Pro His He Ala Lys 525 

Ala Gly Leu Gly Val Ala 540 

Leu Arg Leu Pro Ser Thr 555 

He Val Asn Asp Leu Ser 570 

Gin Trp Leu Lys Glu Lys 585 

Gly Ser Ser Val Glu Asp 600 

Phe Asp Asp Lys Glu Glu 615 




t . ■ ' ■ • 
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Ser Glu 



lie Thr Val Glu Cys Leu Asn Lys Arg Pro Asp Leu Lys 



630 



Glu Leu 



Val Arg Ser Glu lie Lys Leu Ser Glu Leu Glu Asn Asn 



645 



Asn lie 



Thr Phe Tyr Ser Cys Gly Pro Ala Thr Phe Asn Asp Asp 



660 



Phe Arg 



Asn Ala Val Val Gin Gly lie Asp Ser Ser Leu Lys lie 



675 



Asp Val 



Glu Leu Glu Glu Glu Ser Phe Thr Trp *** 



687 



@E?iJ#-Sf : 3 
IEJ'JO:g£ : 17 

ih^'jwm : mm 

mom. : 1 *il 

h^o it- : mmw 

4# Wl IS T IS : primer bind 
Sr2?'J : 

GACTCGAGTC GACATCG 17 

w.mm 1 ^ : 4 

IE?'J(7)S£ : 2 4 

ie^jwm : mm. 
mown : i *m 

h # D v - : W.mtK 

is n o) m m ■. m <n m m ^sdna 
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4#ifc £^"f 15-^ : primer bind 

mm : 

ACACTTATTA GCACTTCATG TATT 24 



IH?'J#^ : 5 
Sa^'JWSS : 8 3 

iE?'jcDgj : mm 

mo$k : 1 *M 

h ^ O - : S IN iX 

IS 5'J CD ® m : lt!i W ^ m dSDN A 

^m^^TIE-^ : primer bind 
@E£'J : 

GAATTCTCTA GACTCCACCA TGGTTAGAAC 
CTTCGCTACA GTCCAATCGA GCG 



CAGAGTCCTT TTCTGCCTCT TCATCTCTTT 60 

83 



IE?'J#^ : 6 
IE5'J<7):S£ : 8 3 

mm (Dm -.mm 

m<D$k : 1 *M 

mmo&WL : 

, *Rr» S^-TlB^f : primer bind 

mm : 



8 
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GTCCAATCGA GCGCTACACT CATCTCCACT TCATGCATTT CTCAGGCTGC ACTGTACCAG 60 



TTCGGATGCT CAAGCAAGTC AAA 



83 



be m $ ^ 




: 7 


be m (d s 




: 8 3 


be m <d m 




■ &fflt 






: 1 #H 








ie m co m 






ie m <d 4$ 


US: 





n fife D N A 



ftWL&mL-ftm : primer bind 
BE?'J : 

CAAGCAAGTC AAAGTCTTGC TACTGCAAGA ACATCAATTG GCTCGGAAGC GTCACTGCAT 60 



GCGCTTATGA GAACTCCAAA TCT 



83 



BE?'J#^ 


8 


SE 9\ co s $ 


8 3 


SE ?'J co M 


mm. 


iicogfr 


■ l 


h jR □ — 


a ss w 


BE ?>j co m IS 




BE?'Jco^M 





4#iUl£STfE^ : primer bind 
BE?'J : 

TCCAGTGTGT AAACCTTGAT ACTTGAGCAT TGGCTGGCAA GTTTCATCAA AGCGGAGTCC 60 

9 
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AGAGTCTTGT TAGATTTGGA GTT 83 



is n $ # 


: 9 


IS ?'J co g $ 


: 8 3 


ie ?u cd m 


: mm 




: 1 *M 


b 7j? O >? - 


: mmjx 


IE ?'J co « 


: fllico^ 


IE ?'J co ft % 





1* « ^ T SB : primer bind 
lE^J : 

TGTCTTCTTA TCGGATTTCT CAGGAGCGCG AAGGTAGTTA CTTGCATTAA GGTAGATGTT 60 
CTTCATGTCC TCCAGTGTGT AAA 83 





1 0 


@E?'JC0:5£ : 


8 3 


SE m <D m : 


&® 


mnik. : 


1 *tft 


h # □ v - : 




SE^'JcdM^M . 


ftlico^ 


iE^'Jco^^ 





«r®S:S£Tia^ : primer bind 
@E?'J : 

GGATCCCATA GTTTTCCTCA TAGTAGTAGT GATAGGCCGT CTCATTTGCC ATCAACGGTT 60 
GTGAAACAAC TGTCTTCTTA TCG 83 



10 



WO 99/48356 



PCT/JP99/01481 



mmm 1 ^ i i 

ga^'JCDfi^ : 8 0 

mm (Dm ■. mm 

m<D& : 1 *M 
h # □ v - : it « 
Sa^'jwaii : fl©ti a - fi£ D N A 

4#ifc£^Tta^ : primer bind 

GGATCCACTT GAATTTGATG CGATCTCAAT 

TCGCAGTCCT TACCGCCGCA 



GGTGCGCATG GGGCCTCGTC TTCTTCTGGG 60 

80 





1 2 




8 0 


IB m <Z> M 






1 ^11 


h ^ D is — 













^r^-riE-^ : primer bind 
CCTTACCGCC GCAACTATCT TGAACATTCT CAAACGCGTA TTCGGCAAGA ACATTATGGC 60 
AAATTCTGTT AAGAAGTCTC 80 
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1 3 




8 0 


mm (DM 


• mm 




■ i *m 


h # a is - 


: S&tt 











«ffit*^-TiH^ : primer bind 
ga^'J : 

GTTAAGAAGT CTCTTATCTA CCCAAGCGTT TACAAAGACT ACAACGAGAG AACTTTCTAT 60 
CTTTGGAAAC GTTTGCCATT 80 



ia?'J#^ : 1 4 
IE?'J OS £ : 8 0 

w.n com : mm 
m<n®L : i *m 

h#Dy- : lift 

gffflL&^Ttm : primer bind 
AGAGTGAGAG AATAGTCAGA ATGACAAAGA 
TAAAGTTGAA TGGCAAACGT 



SS?'J#^ : 1 5 
IS?'J(Dfi£ : 8 0 



TAAGAACTAC GAGTCCTTTG CCTCGAGTTG 60 

80 
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mm (Dm mm 
m<o& i *m 

EJlJiOlSi : fi!i<75£EB? pSD N A 

4#^£STSS^ : primer bind 
IB^'J : 

AATGCCATTG ATCTTCTCCA TCTAGGTCTA 
CCGAAAGAGA GTGAGAGAAT 



TCGTAAGGAT GTGGCAACTT GATGTTATGT 60 

80 



@5 m $ ^ : 


1 6 


IE ?'J S $ : 


8 0 


§e ?y <d m : 


mm 


: 


i &m 


h □ is — 


mmw 


is ¥\ <d m m 


mom 







Sr^fSE^ : primer bind 
TCCGGATACC GAAAAGGTAC ACCACGGGGA AAAGAGCGAT TGCCATCAAG TCAGCACGGC 60 
GTGAGACGAA TGCCATTGAT 80 



ie m m n 


1 7 




8 3 


se m <D m 


mm 




i 



13 
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m^mmm : m<Dmm ^^dna 

mm^^-rtm : primer bind 
SE?'J : 

TCCGGAACAA CCCCTTCATC CCAATCACCG 
ACAAATGGTC AGCATACGTC TGC 
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GATTGAGCTT TAGTACTTTC AACTTTTACC 60 

83 





1 8 




8 3 


m m co m 


mm 




: i *m 


h ^ n v - 


: mmjx 


IE m cd m m 




is m co m m 





4# Wi i*E "f" sB # : primer bind 

GCATACGTCT GCTTCATGTT AGCCGTCGTC CATTCAATCG TTATGACCGC TTCAGGAGTT 60 
AAACGAGGAG TATTCCAGTC TCT 83 









1 9 


IE?'J co 






8 3 


@E?'J co 


M 






$Hco?£ 






i &m 


h^n 


is 






SH^'J co 


m 


SB 


: mom 



14 
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ftWLZ^.-tnm : primer bind 
@E?'J : 

TATTCCAGTC TCTTGTAAGG AAATTCTACT 
TGTCCATCAT CATTTTCCAG TCC 



TCAGATGGGG AATAGTAGCC ACAATTCTTA 60 

83 



IE?'J1£^ 


2 0 


ie m <d s $ 


8 3 


@s m <d m 


mm 




i *m 


b 7fi a *j - 


■. mmw 






@e n <d & m. 





2: ^TsE-^f : primer bind 

SE?'J : 

ATAAACATGA TGTTCATGGC TTTGTGAATA AGTAAGAAGA TTTCATAACC TCGGTTCCTG 60 
AAGACCTTCT CGGACTGGAA AAT 83 





2 1 


gE5'J<7)g£ . 


8 3 


ia m <d m 






1 *££ 










IE co $r m 





#M^3gt"HE^- : primer bind 

15 
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IE?'J : 

GAGGATGCCA GCCATGGACC AGATCCAGCC CATCCATCCT AGTGTGTGGC AATGGTAATA 60 



CATAGCTATG ATAAACATGA TGT 



83 



be *j # # 




2 2 






8 3 


ie m cd m 










: 1 


h # □ V 








& 


: mom 


ie cd m 







4#ifc£^TIE^ : primer bind 
IE?'J : 

GTCGACAAAG TGGCGGTCTT AAGACCTCCG TTCATGATGA TACGTACAAT TCGGCAGAAC 60 



CTGTCGAAGC AGAGGATGCC AGC 



83 





2 3 


ie m <d s $ 


8 2 


SS ^'J co ^ 






1 *4£ 






se m <d m m 




se m <d m » 





gffflL&^TW.'^ : primer bind 
IE?'J : 

GTCGACCACA GATGATTCTA ACGTTATCAA GATCTCTGTC AAGAAGCCTA AGTTCTTCAA 60 

16 



r 
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GTATCAAGTG GGAGCATTTG CC 82 



ie m # n 


: 2 4 


ie n <d s $ 


: 8 2 


is co m 


: 




: i *m 


b # n i? - 


: lift 


SE ?|J CIS 


: {&<Dm 







4#m^^TIE-^ : primer bind 
SE?'J : 

GGAGCATTTG CCTATATGTA CTTTCTTTCA CCAAAATCAG CCTGGTTCTA CAGTTTTCAA 60 
TCTCATCCCT TCACAGTCCT AT 82 





: 2 5 


se m <o m $ 


: 8 2 


IE co M 


: &B& 




: 1 




: Sitt 




: 


SE ?'J co # & 





4#^€:^TbE^ : primer bind 
IE#I : 

TTCACAGTCC TATCAGAAAG GCACAGAGAT CCTAACAACC CAGATCAACT AACTATGTAC 60 
GTCAAAGCTA ACAAGGGCAT TA 82 
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ia?u#^ 




2 6 


ie m <d s 




8 2 


sa <d m 










1 


h # □ v 






sa #i <75 m 


si 











4# M IS ~f~ iE ^ : primer bind 
iSa^'J : 

CCTCTAAGAA AATCTTGCAA TCAACGGTAT GGTTTGGAGC GCTTAGAACT TTGCTAAGAA 60 
GTACTCTCGT AATGCCCTTG TT 82 





2 7 


is m <n S £ : 


8 2 


IS ?ij O §k : 




II <D » : 


i *m 






ie m cd m m 









$S8fc££TIfi^ : primer bind 

sa^j : 

GGCCCGCAGC TACTCCTACT AGATTTCTCT TAAGTTTGGC AATGTGAGGG ACAGTTACGC 60 
CATATGGTCC CTCTAAGAAA AT 82 
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2 8 






8 2 


ia n <o m 










i 










II 




@H n <D *Rr 







^m^mTtm : primer bind 
: 

CTGCAGTTGA TCAGTGCTAG GCAATCTAAG GCATTCTACG AAATGGGGGT AGATGGCTGC 60 
CACGCCGAGG CCCGCAGCTA CT 82 



@S?iJ#^ : 2 9 
E5"JOS$ : 7 7 

iH?'j<^si : mm 
m<D%t : i *m 

h # p >? - : ifi & « 

4$ fflL £ * -T IS # : primer bind 
IS^'J : 

CTGCAGCACA AGTTCTACTG GATCGTCAAC 
GAGCTACAAT GGCTTAA 



: 3 0 

ia m <K S $ : 7 7 



GACCTTAGTC ACCTTAAGTG GTTCGAAAAC 60 

77 
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m<D$k : 1 *m 

h jj? □ >? - .- it m 

iB3WJ<oa3S : ft!i<£>4« &UDNA 
gE^J0 4#M : 

4#i4 £^-tf5# : primer bind 
SE^'J : 

ACAATGGCTT AAGGAGAAAT CTTGTGAAGT 
GGATACAAAC TCAGATG 



CTCTGTCATC TACACTGGGT CATCAGTGGA 60 

77 



SE^'JII^ : 3 1 
IS^'JCDS^ : 7 7 

m<D^. : 1 

4#M£3lTfE^ : primer bind 
IE?'J : 

CAAACTCAGA TGAGTCCACT AAGGGTTTCG 
AATGCCTTAA CAAGAGG 



ATGACAAGGA AGAATCTGAA ATCACCGTAG 60 

77 



@E£'J## : 3 2 
Ir2?iJ<75S$ : 7 7 

: 1 *m 
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h # □ *J - : B. SS. iX 

m.m<Dftm : 

mWL&^L-ttm : primer bind 
GTGATGTTGT TGTTCTCGAG TTCTGACAAT 
TCTGGCCTCT TGTTAAG 
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TTGATCTCTG ATCTCACTAG CTCTTTGAGG 60 

77 



mmmn ■. 3 3 

MPKD&Z : 7 7 

sa^'j (Dm : mwt 

m(D$L : 1 

h sp P V — : it it 

ffi*U<Z)il«i : {tHO&ffi a" fi£ D N A 
IH^JC0 4#^ : 

«f«Sr^-rffi^ : primer bind 
BE?"J : 

CGATACCTTG TACAACTGCA TTCCTAAAGT CGTCATTGAA AGTCGCTGGT CCGCATGAGT 60 
AGAAAGTGAT GTTGTTG 77 



sa?ij#^ : 


3 4 


IS ?<J : 


7 7 


SE £'J <D M 






1 *it 






@a m omm 
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fS«£&-$"fS# : primer bind 
BH^iJ : 

AAGCTTGAGC TCTTACCAAG TAAAACTCTC 
AGAATCGATA CCTTGTA 
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CTCCTCTAGT TCGACATCTA TCTTCAGACT 60 

77 
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